GitHub
This page contains the setup guide and reference information for GitHub.
Prerequisites
- List of GitHub repositories
- GitHub personal access token
Features
Feature | Supported? |
---|---|
Full Refresh Overwrite | Yes |
Full Refresh Append | Yes |
Incremental Sync Append | Yes |
Incremental Sync Append + Deduped | Yes |
Setup guide
Step 1: Obtain GitHub personal access token
Sign in to your GitHub account.
Go to Settings -> Developer settings -> Personal access tokens page.
Click Generate new token, select scopes which define the access for the token, and click Generate token.
NOTE: Your token should have at least the
repo
scope. Depending on which streams you want to sync, the user generating the token needs more permissions:- For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions here.
- For syncing Teams is only available to authenticated members of a team's organization. Personal user accounts and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
- For syncing Projects, the repository must have the Projects feature enabled.
Save your access token for later use.
Step 2: Set up GitHub in Daspire
Select GitHub from the Source list.
Enter a Source Name.
Authenticate with Personal Access Token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with
,
.GitHub Repositories - Enter a list of GitHub organizations/repositories, e.g.
daspirehq/daspire
for single repository,daspirehq/daspire
daspirehq/daspire2
for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example:daspirehq/*
.CAUTION: Repositories with the wrong name or repositories that do not exist or have the wrong name format will be skipped.
Start date (Optional) - The date from which you'd like to replicate data for streams. For streams which support this configuration, only data generated on or after the start date will be replicated.
- These streams will only sync records generated on or after the Start Date:
comments
,commit_comment_reactions
,commit_comments
,commits
,deployments
,events
,issue_comment_reactions
,issue_events
,issue_milestones
,issue_reactions
,issues
,project_cards
,project_columns
,projects
,pull_request_comment_reactions
,pull_requests
,pull_requeststats
,releases
,review_comments
,reviews
,stargazers
,workflow_runs
,workflows
.
- The Start Date does not apply to the streams below and all data will be synced for these streams:
assignees
,branches
,collaborators
,issue_labels
,organizations
,pull_request_commits
,pull_request_stats
,repositories
,tags
,teams
,users
- These streams will only sync records generated on or after the Start Date:
Branch (Optional) - List of GitHub repository branches to pull commits from, e.g.
daspirehq/daspire/main
. If no branches are specified for a repository, the default branch will be pulled.Max requests per hour (Optional) - The GitHub API allows for a maximum of 5,000 requests per hour (15,000 for Github Enterprise). You can specify a lower value to limit your use of the API quota. Refer to GitHub article Rate limits for the REST API.
Click Save & Test.
Supported streams
This source outputs the following full refresh streams:
- Assignees
- Branches
- Contributor Activity
- Collaborators
- Issue labels
- Organizations
- Pull request commits
- Tags
- TeamMembers
- TeamMemberships
- Teams
- Users
- Issue timeline events
This source outputs the following incremental streams:
- Comments
- Commit comment reactions
- Commit comments
- Commits
- Deployments
- Events
- Issue comment reactions
- Issue events
- Issue milestones
- Issue reactions
- Issues
- Project (Classic) cards
- Project (Classic) columns
- Projects (Classic)
- ProjectsV2
- Pull request comment reactions
- Pull request stats
- Pull requests
- Releases
- Repositories
- Review comments
- Reviews
- Stargazers
- WorkflowJobs
- WorkflowRuns
- Workflows
Notes
Only 4 streams (
comments
,commits
,issues
andreview comments
) from the listed above streams are pure incremental meaning that they:- read only new records;
- output only new records.
Streams
workflow_runs
andworflow_jobs
is almost pure incremental:- read new records and some portion of old records (in past 30 days);
- the
workflow_jobs
depends on theworkflow_runs
to read the data, so they both follow the same logic docs; - output only new records.
Other 19 incremental streams are also incremental but with one difference, they:
- read all records;
- output only new records. Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
Sometimes for large streams specifying very distant
start_date
in the past may result in keep on getting error from GitHub instead of records. In this case Specifying more recentstart_date
may help. The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:assignees
branches
collaborators
issue_labels
organizations
pull_request_commits
pull_request_stats
repositories
tags
teams
users
Performance consideration
The GitHub integration should not run into GitHub API limitations under normal usage. Refer to GitHub article Rate limits for the REST API.
Troubleshooting
Max number of tables that can be synced at a time is 6,000. We advise you to adjust your settings if it fails to fetch schema due to max number of tables reached.