GitHub
Site admins can sync Git repositories hosted on GitHub.com and GitHub Enterprise with Sourcegraph so that users can search and navigate the repositories.
To connect GitHub to Sourcegraph:
- Depending on whether you are a site admin or user:
- Site admin: Go to Site admin > Manage repositories > Add repositories
- User: Go to Settings > Manage repositories.
- Select GitHub.
- Configure the connection to GitHub using the action buttons above the text field, and additional fields can be added using Cmd/Ctrl+Space for auto-completion. See the configuration documentation below.
- Press Add repositories.
NOTE That adding code hosts as a user is currently in private beta.
Supported versions
- GitHub.com
- GitHub Enterprise v2.10 and newer
Selecting repositories for code search
There are four fields for configuring which repositories are mirrored/synchronized:
repos
A list of repositories inowner/name
format.orgs
A list of organizations (every repository belonging to the organization will be cloned).repositoryQuery
A list of strings with three pre-defined options (public
,affiliated
,none
, none of which are subject to result limitations), and/or a GitHub advanced search query. Note: There is an existing limitation that requires the latter, GitHub advanced search queries, to return less than 1000 results. See this issue for ongoing work to address this limitation.exclude
A list of repositories to exclude which takes precedence over therepos
,orgs
, andrepositoryQuery
fields.
GitHub API token and access
The GitHub service requires a token
in order to access their API. There are two different types of tokens you can supply:
- Personal access token:
This gives Sourcegraph the same level of access to repositories as the account that created the token. If you're not wanting to mix your personal repositories with your organizations repositories, you could add an entry to theexclude
array, or you can use a machine user token. - Machine user token:
Generates a token for a machine user that is affiliated with an organization instead of a user account.
No token scopes are required if you only want to sync public repositories and don't want to use any of the following features. Otherwise, the following token scopes are required:
repo
to sync private repositories from GitHub to Sourcegraph.read:org
to use the"allowOrgs"
setting with a GitHub authentication provider.repo
,read:org
, andread:discussion
to use campaigns with GitHub repositories. See "Code host interactions in campaigns" for details.
GitHub.com rate limits
You should always include a token in a configuration for a GitHub.com URL to avoid being denied service by GitHub's unauthenticated rate limits. If you don't want to automatically synchronize repositories from the account associated with your personal access token, you can create a token without a repo
scope for the purposes of bypassing rate limit restrictions only.
Internal rate limits
Internal rate limiting can be configured to limit the rate at which requests are made from Sourcegraph to GitHub.
If enabled, the default rate is set at 5000 per hour which can be configured via the requestsPerHour
field (see below). If rate limiting is configured more than once for the same code host instance, the most restrictive limit will be used.
NOTE Internal rate limiting is only currently applied when synchronising campaign changesets.
Repository permissions
By default, all Sourcegraph users can view all repositories. To configure Sourcegraph to use GitHub's per-user repository permissions, see "Repository permissions".
User authentication
To configure GitHub as an authentication provider (which will enable sign-in via GitHub), see the authentication documentation.
Webhooks
The webhooks
setting allows specifying the organization webhook secrets necessary to authenticate incoming webhook requests to /.api/github-webhooks
.
"webhooks": [ {"org": "your_org", "secret": "verylongrandomsecret"} ]
Using webhooks is highly recommended when using campaigns, since they speed up the syncing of pull request data between GitHub and Sourcegraph and make it more efficient.
To set up webhooks:
- In Sourcegraph, go to Site admin > Manage repositories and edit the GitHub configuration.
- Add the
"webhooks"
property to the configuration (you can generate a secret withopenssl rand -hex 32
):
"webhooks": [{"org": "your_org", "secret": "verylongrandomsecret"}]
- Click Update repositories.
- Copy the webhook URL displayed below the Update repositories button.
- On GitHub, go to the settings page of your organization. From there, click Settings, then Webhooks, then Add webhook.
- Fill in the webhook form:
- Payload URL: the URL you copied above from Sourcegraph.
- Content type: this must be set to
application/json
. - Secret: the secret token you configured Sourcegraph to use above.
- Which events: select Let me select individual events, and then enable:
- Issue comments
- Pull requests
- Pull request reviews
- Pull request review comments
- Check runs
- Check suites
- Statuses
- Active: ensure this is enabled.
- Click Add webhook.
- Confirm that the new webhook is listed.
Done! Sourcegraph will now receive webhook events from GitHub and use them to sync pull request events, used by campaigns, faster and more efficiently.
Configuration
GitHub connections support the following configuration options, which are specified in the JSON editor in the site admin "Manage repositories" area.
admin/external_service/github.schema.json
{ // If non-null, enforces GitHub repository permissions. This requires that there is an item in the `auth.providers` field of type "github" with the same `url` field as specified in this `GitHubConnection`. "authorization": null, // TLS certificate of the GitHub Enterprise instance. This is only necessary if the certificate is self-signed or signed by an internal CA. To get the certificate run `openssl s_client -connect HOST:443 -showcerts < /dev/null 2> /dev/null | openssl x509 -outform PEM`. To escape the value into a JSON string, you may want to use a tool like https://json-escape-text.now.sh. "certificate": null, // Other example values: // - "-----BEGIN CERTIFICATE-----\n..." // When set to true, this external service will be chosen as our 'Global' GitHub service. Only valid on Sourcegraph.com. Only one service can have this flag set. "cloudGlobal": false, // A list of repositories to never mirror from this GitHub instance. Takes precedence over "orgs", "repos", and "repositoryQuery" configuration. // // Supports excluding by name ({"name": "owner/name"}) or by ID ({"id": "MDEwOlJlcG9zaXRvcnkxMTczMDM0Mg=="}). // // Note: ID is the GitHub GraphQL ID, not the GitHub database ID. eg: "curl https://api.github.com/repos/vuejs/vue | jq .node_id" "exclude": null, // Other example values: // - [{"forks":true}] // - [ // { // "name": "owner/name" // }, // { // "id": "MDEwOlJlcG9zaXRvcnkxMTczMDM0Mg==" // } // ] // - [ // { // "name": "vuejs/vue" // }, // { // "name": "php/php-src" // }, // { // "pattern": "^topsecretorg/.*" // } // ] // The type of Git URLs to use for cloning and fetching Git repositories on this GitHub instance. // // If "http", Sourcegraph will access GitHub repositories using Git URLs of the form http(s)://github.com/myteam/myproject.git (using https: if the GitHub instance uses HTTPS). // // If "ssh", Sourcegraph will access GitHub repositories using Git URLs of the form [email protected]:myteam/myproject.git. See the documentation for how to provide SSH private keys and known_hosts: https://docs.sourcegraph.com/admin/repo/auth#repositories-that-need-http-s-or-ssh-authentication. "gitURLType": "http", // Deprecated and ignored field which will be removed entirely in the next release. GitHub repositories can no longer be enabled or disabled explicitly. Configure repositories to be mirrored via "repos", "exclude" and "repositoryQuery" instead. "initialRepositoryEnablement": null, // An array of organization names identifying GitHub organizations whose repositories should be mirrored on Sourcegraph. "orgs": null, // Other example values: // - ["name"] // - [ // "kubernetes", // "golang", // "facebook" // ] // Rate limit applied when making background API requests to GitHub. "rateLimit": { "enabled": true, "requestsPerHour": 5000 }, // An array of repository "owner/name" strings specifying which GitHub or GitHub Enterprise repositories to mirror on Sourcegraph. "repos": null, // Other example values: // - ["owner/name"] // - [ // "kubernetes/kubernetes", // "golang/go", // "facebook/react" // ] // The pattern used to generate the corresponding Sourcegraph repository name for a GitHub or GitHub Enterprise repository. In the pattern, the variable "{host}" is replaced with the GitHub host (such as github.example.com), and "{nameWithOwner}" is replaced with the GitHub repository's "owner/path" (such as "myorg/myrepo"). // // For example, if your GitHub Enterprise URL is https://github.example.com and your Sourcegraph URL is https://src.example.com, then a repositoryPathPattern of "{host}/{nameWithOwner}" would mean that a GitHub repository at https://github.example.com/myorg/myrepo is available on Sourcegraph at https://src.example.com/github.example.com/myorg/myrepo. // // It is important that the Sourcegraph repository name generated with this pattern be unique to this code host. If different code hosts generate repository names that collide, Sourcegraph's behavior is undefined. "repositoryPathPattern": "{host}/{nameWithOwner}", // An array of strings specifying which GitHub or GitHub Enterprise repositories to mirror on Sourcegraph. The valid values are: // // - `public` mirrors all public repositories for GitHub Enterprise and is the equivalent of `none` for GitHub // // - `affiliated` mirrors all repositories affiliated with the configured token's user: // - Private repositories with read access // - Public repositories owned by the user or their orgs // - Public repositories with write access // // - `none` mirrors no repositories (except those specified in the `repos` configuration property or added manually) // // - All other values are executed as a GitHub advanced repository search as described at https://github.com/search/advanced. Example: to sync all repositories from the "sourcegraph" organization including forks the query would be "org:sourcegraph fork:true". // // If multiple values are provided, their results are unioned. // // If you need to narrow the set of mirrored repositories further (and don't want to enumerate it with a list or query set as above), create a new bot/machine user on GitHub or GitHub Enterprise that is only affiliated with the desired repositories. "repositoryQuery": [ "none" ], // A GitHub personal access token. Create one for GitHub.com at https://github.com/settings/tokens/new?description=Sourcegraph (for GitHub Enterprise, replace github.com with your instance's hostname). See https://docs.sourcegraph.com/admin/external_service/github#github-api-token-and-access for which scopes are required for which use cases. "token": null, // URL of a GitHub instance, such as https://github.com or https://github-enterprise.example.com. "url": null, // Other example values: // - "https://github.com" // - "https://github-enterprise.example.com" // An array of configurations defining existing GitHub webhooks that send updates back to Sourcegraph. "webhooks": null // Other example values: // - [ // { // "org": "yourorgname", // "secret": "webhook-secret" // } // ] }
Troubleshooting
RepositoryQuery returns first 1000 results only
GitHub's Search API only returns the first 1000 results. Therefore a repositoryQuery
(other than the three pre-defined options) needs to return a 1000 results or less otherwise Sourcegraph will not synchronize some repositories. To workaround this limitation you can split your query into multiple queries, each returning less than a 1000 results. For example if your query is org:Microsoft fork:no
you can adjust your query to:
{ // ... "repositoryQuery": [ "org:Microsoft fork:no created:>=2019", "org:Microsoft fork:no created:2018", "org:Microsoft fork:no created:2016..2017", "org:Microsoft fork:no created:<2016" ] }
If splitting by creation date does not work, try another field. See GitHub advanced search query for other fields you can try.
See Handle GitHub repositoryQuery that has more than 1000 results for ongoing work to address this limitation.