Deploying Sourcegraph executors
Executors provide a sandbox that can run resource-intensive or untrusted tasks on behalf of the Sourcegraph instance, such as:
Requirements
Our sandboxing model requires the executor binary to be run on machines capable of running Linux KVM extensions. This requires bare-metal machines or machines capable of nested virtualization, which is made available by most popular Cloud providers.
Installation
Executors operate outside of your Sourcegraph instance and must be run separately from your Sourcegraph server deployment method.
Since they must still be able to reach the Sourcegraph instance in order to dequeue and perform work, requests between the Sourcegraph instance and the executors are authenticated via a shared secret.
That means, in order to deploy executors that can talk to the Sourcegraph instance, you need to do the following:
- Confirm executors can reach Sourcegraph instance
- Optional: Configuring auto scaling
- Optional: Configuring observability
Configure Sourcegraph
Before starting any executors, generate an arbitrary secret string (with at least 20 characters) and set it as the executors.accessToken
key in your Sourcegraph instance's site-config.
Once the access token is set, executors can use that access token to talk to the Sourcegraph instance.
Run executors
There are two ways to install and run executors:
- Using our Terraform modules to provision machines on Google Cloud or AWS that run executors
- Downloading and running executor binaries yourself
Terraform
We supply Terraform modules to provision machines running executors on common cloud providers (Google Cloud or AWS).
A Terraform definition of executor compute resources will look similar to the following basic, minimal usage. Here, we configure the use of a Terraform module defined in the public registry - no explicit installation or clone step is required to use the modules provided by Sourcegraph.
module "executors" { source = "sourcegraph/executors/<cloud>" version = "<version>" executor_sourcegraph_external_url = "<sourcegraph_external_url>" executor_sourcegraph_executor_proxy_password = "<shared_secret_configured_in_sourcegraph_instance>" executor_queue_name = "codeintel" # Type of work (e.g., codeintel, batches) executor_instance_tag = "codeintel" executor_metrics_environment_label = "prod" docker_mirror_static_ip = "10.0.1.4" }
Two variables must be supplied to the module in order for it to contact your Sourcegraph instance:
sourcegraph_external_url
(Google; AWS): The public URL of your Sourcegraph instance. This corresponds to theexternalURL
value in your Sourcegraph instance's site-config and must be resolvable from the provisioned executor compute resources.sourcegraph_executor_proxy_password
(Google; AWS): The access token chosen and configured above.
Additional values may need to be supplied for a specific cloud provider. Refer to the relevant Terraform module documentation for specifics.
To deploy executor compute resources defined in the Terraform file above, simply run terraform apply
.
See the Examples for more information on how to configure and deploy single or multiple executors.
Binaries
You can also download and run the executor binaries yourself, without using Terraform.
The following dependencies need to be available on the machine on which you want to run the executor
binary:
- Docker
- git >= v2.18
- Ignite v0.10.0
- CNI Plugins v0.9.1
You can also take a look at what goes into our executor machine images, used by our Terraform modules, to see how we run executor binaries.
Once dependencies are met, you can download and run executor binaries:
Step 1: Confirm that virtualization is enabled
The following command checks whether virtualization is enabled on the machine, which is required for our sandboxing model.
If it prints something other than 0, virtualization is enabled.
grep -cw vmx /proc/cpuinfo
Step 2: Download latest binary
Below are the download links for the insiders release (latest
) of executors:
Download and setup executor
binary:
curl -sfLo executor https://storage.googleapis.com/sourcegraph-artifacts/executor/latest/linux-amd64/executor chmod +x executor mv executor /usr/local/bin
Step 3: Generate Ignite base image
This creates the base image that Ignite will use when creating new Firecracker microVMs for each job.
# Change this to use the version of src-cli that's compatible with your Sourcegraph instance. # See this for details: https://github.com/sourcegraph/src-cli#version-compatible-with-your-sourcegraph-instance export SRC_CLI_VERSION="3.34.1" export EXECUTOR_FIRECRACKER_IMAGE="sourcegraph/ignite-ubuntu:insiders" # Download Dockerfile and build Docker image mkdir -p /tmp/ignite-ubuntu curl -sfLo /tmp/ignite-ubuntu/Dockerfile https://raw.githubusercontent.com/sourcegraph/sourcegraph/main/enterprise/cmd/executor/image/ignite-ubuntu/Dockerfile docker build -t "${EXECUTOR_FIRECRACKER_IMAGE}" --build-arg SRC_CLI_VERSION="${SRC_CLI_VERSION}" /tmp/ignite-ubuntu # Import Docker image into Ignite ignite image import --runtime docker "${EXECUTOR_FIRECRACKER_IMAGE}" # Clean up docker image rm "${EXECUTOR_FIRECRACKER_IMAGE}"
Step 4: Optional: Pre-heat Ignite by importing kernel image
This step imports the kernel image to avoid the executor having to import it when executing the first job.
# Change this to match the version of Ignite you're using. export IGNITE_VERSION=v0.10.0 export KERNEL_IMAGE="weaveworks/ignite-kernel:5.10.51" ignite kernel import --runtime docker "${KERNEL_IMAGE}" docker pull "weaveworks/ignite:${IGNITE_VERSION}"
Step 5: Setup required environment variables
Env var | Example value | Description |
---|---|---|
EXECUTOR_FRONTEND_URL |
http://sourcegraph.example.com |
The external URL of the Sourcegraph instance. |
EXECUTOR_FRONTEND_PASSWORD |
our-shared-secret |
The shared secret configured in the Sourcegraph instannce under executors.accessToken |
EXECUTOR_QUEUE_NAME |
batches |
The name of the queue to pull jobs from to. Possible values: batches and codeintel |
# Example: export EXECUTOR_QUEUE_NAME=batches export EXECUTOR_FRONTEND_URL=http://sourcegraph.example.com export EXECUTOR_FRONTEND_PASSWORD=hunter2hunter2hunter2
Step 6: Start executor
/usr/local/bin/executor
Confirm executors are working
If executor instances boot correctly and can authenticate with the Sourcegraph frontend, they will show up in the Executors page under Site Admin > Maintenance.
Examples
The following are complete examples of provisioning a single executor type using each of the provided Terraform modules. In these examples, executors pull from the queue codeintel
, which controls auto-indexing.
The following are complete examples of provisioning multiple executor types using the provided Terraform submodules. In these examples, two pools of executors pull from the codeintel
and batches
queues, which control auto-indexing and running batch changes server-side, respectively.
Example installation
Let's walk through setting up a single executor VM on GCP and indexing a repository.
- Install Terraform
0.13.7
(must match the version listed in.tool-versions
):
brew install tfenv tfenv install 0.13.7 tfenv use 0.13.7
- Install
gcloud
- Run
gcloud auth application-default login
- Open your Sourcegraph instance in your browser, click your profile in the top right, click Site admin, expand Configuration, click Site configuration, and set:
"externalURL": "<URL>"
to a URL that is accessible from the GCP VM that will be created later (e.g. a public URL such ashttps://sourcegraph.example.com
)"executors.accessToken": "<new long secret>"
to a new long secret (e.g.cat /dev/random | base64 | head -c 20
)"codeIntelAutoIndexing.enabled": true
- Download the example files and change these:
project
: your GCP project name and changeregion
andzone
if neededexecutor_sourcegraph_external_url
: this must matchexternalURL
you set in your site configexecutor_sourcegraph_executor_proxy_password
: this must matchexecutors.accessToken
you set in your site config
- Run
terraform init
to download the Sourcegraph executor modules - Run
terraform apply
and enter "yes" to create the executor VM - Go back to the site admin page, expand Maintenance, click Executors, and check to see if your executor shows up in the list with a green dot 🟢. If it's not there:
- Make sure
terraform apply
exited with code 0 and did not print any errors - Make sure a GCP VM was created:
$ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS sourcegraph-executor-h0rv us-central1-c n1-standard-4 10.0.1.16 ... RUNNING sourcegraph-executors-docker-registry-mirror us-central1-c n1-standard-2 10.0.1.2 ... RUNNING
- Make sure the
executor
service is running:
you@sourcegraph-executor-h0rv:~$ systemctl status executor 🟢 executor.service - User code executor Loaded: loaded (/etc/systemd/system/executor.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2021-11-18 02:28:48 UTC; 19s ago
- Make sure there are no errors in the
executor
service logs:
you@sourcegraph-executor-h0rv:~$ journalctl -u executor | less Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=92.131237ms Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=90.630467ms Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=91.269106ms Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=161.469685ms
- Make sure the
EXECUTOR_FRONTEND_URL
andEXECUTOR_FRONTEND_PASSWORD
in/etc/systemd/system/executor.env
are correct - Make sure the VM can hit your
externalURL
:
you@sourcegraph-executor-h0rv:~$ curl <your externalURL here> <a href="/sign-in?returnTo=%2F">Found</a>
- Go back to the site admin page, expand Code intelligence, click Configuration, click Create new policy, and fill in:
- Name:
LSIF
- Click add a repository pattern
- Repository pattern #1: set this to an existing repository on your Sourcegraph instance (e.g.
github.com/gorilla/mux
) - Type:
HEAD
- Auto-indexing: Enabled
- Go to that repository's page, click Code Intelligence, click Auto-indexing, and check to see if an indexing job has appeared. If nothing is there:
- Try clicking Enqueue
- Try setting a higher update frequency:
PRECISE_CODE_INTEL_AUTO_INDEXING_TASK_INTERVAL=10s
- Try setting a lower delay:
PRECISE_CODE_INTEL_AUTO_INDEXING_REPOSITORY_PROCESS_DELAY=10s
- Once you have a completed indexing job, click Uploads and check to see that the LSIF dump has been uploaded.
- Once the dump has been uploaded, you should see the
PRECISE
badge in the hover popover! 🎉 - Optionally, add
.terraform
,terraform.tfstate
, andterraform.tfstate.backup
to your.gitignore
.
Configuring auto scaling
Auto scaling of executor instances can help to increase concurrency of jobs, without paying for unused resources. With auto scaling, you can scale down to 0 instances when no workload exist and scale up as far as you like and your cloud provider can support. Auto scaling needs to be configured separately.
Auto scaling makes use of the auto-scaling capabilities of the respective cloud provider (AutoScalingGroups on AWS and Instance Groups on GCP). Sourcegraph's worker
service publishes a scaling metric (that is, the number of jobs in queue) to the cloud providers. Then, based on that reported value, the auto scalers add and remove compute resources to match the required amount of compute. The autoscaler will attempt to hold 1 instance running per each jobs_per_instance_scaling
items in queue.
For example, if jobs_per_instance_scaling
is set to 20
and the queue size is currently 400
, then 20
instances would be determined as required to handle the load. You might want to tweak this number based on the machine type, concurrency per machine and desired processing speed.
With the Terraform variables min_replicas
and max_replicas
in the Terraform modules linked to above, you can configure the minimum and maximum number of compute machines to be run at a given time.
For auto scaling to work, two things must be true:
min_replicas
must be>= 0
andmax_replicas
must be> min_replicas
.- The Sourcegraph instance (its
worker
service, specifically) needs to publish scaling metrics to the used cloud provider.
For the latter to work, the Sourcegraph instance needs to be configured with the correct credentials that allow it to access the cloud provider.
The credentials
submodule in both our AWS and GCP executor modules exists for that purpose. When used, the credentials
module sets up the credentials on the cloud provider and returns them in the Terraform outputs.
Here's an example of how one would use the credentials
submodule:
module "my-credentials" { source = "sourcegraph/executors/<cloud>//modules/credentials" # Find the latest version here: # - https://github.com/sourcegraph/terraform-google-executors/tags # - https://github.com/sourcegraph/terraform-aws-executors/tags version = "<version>" region = <region> resource_prefix = "" } # For Google: output "metric_writer_credentials_file" { value = module.my-credentials.metric_writer_credentials_file } # For AWS: output "metric_writer_access_key_id" { value = module.my-credentials.metric_writer_access_key_id } output "metric_writer_secret_key" { value = module.my-credentials.metric_writer_secret_key }
After a terraform apply
, a terraform output
should yield something like this:
# For AWS: metric_writer_access_key_id = <THE_ACCESS_KEY_TO_CONFIGURE> metric_writer_secret_key = <THE_SECRET_KEY_TO_CONFIGURE> # For Google: metric_writer_credentials_file = <THE_CREDENTIALS_FILE_CONTENT_BASE64_ENCODED>
These outputs can then be used to configure the Sourcegraph instance, which is explained in the following sections for the different cloud providers.
The GCE auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to Cloud Monitoring. The target Sourcegraph instance is expected to continuously write these values.
To write the scaling metric to Cloud Monitoring, the worker
service must have defined the following environment variables:
EXECUTOR_METRIC_ENVIRONMENT_LABEL
: Must use the value that was set asmetrics_environment_label
when provisioningexecutors
EXECUTOR_METRIC_GCP_PROJECT_ID
: The GCP project ID- Option 1:
EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT
: The base64-decoded output of themetric_writer_credentials_file
from above. - Option 2:
EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE
: Path to a file containing the base64-decodedmetric_writer_credentials_file
output from above.
AWS
The EC2 auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to CloudWatch. The target Sourcegraph instance is expected to continuously write these values.
To write the scaling metric to CloudWatch, the worker
service must have defined the following environment variables:
EXECUTOR_METRIC_ENVIRONMENT_LABEL
: Must use the same value asmetrics_environment_label
EXECUTOR_METRIC_AWS_NAMESPACE
: Must be set tosourcegraph-executor
EXECUTOR_METRIC_AWS_REGION
EXECUTOR_METRIC_AWS_ACCESS_KEY_ID
EXECUTOR_METRIC_AWS_SECRET_ACCESS_KEY
Testing auto scaling
Once these are set, and the worker service has been restarted, you should be able to find the scaling metrics in your cloud providers dashboards.
To test if the metric is correctly reported into the Cloud provider:
-
On Google Cloud, this can be found in the Metrics explorer. Select Resource type: Global and then Metric:
custom/executors/queue/size
. You should see values reported here. 0 is also an indicator that it works correct. -
On AWS, this can be found in the CloudWatch metrics section. Under All metrics, select the namespace
sourcegraph-executor
and then the metricenvironment, queueName
. Make sure there are entries returned.
Next, you can test whether the number of executors rises and shrinks as load spikes occur. Keep in mind that auto-scaling is not a real-time operation on most cloud providers and usually takes a short moment and can have some delays between the metric going down and the desired machine count adjusting.
Configuring observability
Sourcegraph ships with dashboards that can display executor metrics. We highly encourage setting this up to help make informed decisions on scaling and to make debugging easier.
In order to do that, the Prometheus instance bundled with your Sourcegraph deployment must be able to scrape the executor metrics endpoint.
That requires two things:
- Provide Prometheus with service account credentials that allow it to get a list of active compute instances from the cloud provider.
- Add additional scrape jobs to Prometheus.
To add service account credentials, you can use the credentials
submodule in both our AWS and GCP executor modules.
module "credentials" { source = "sourcegraph/executors/<cloud>//modules/credentials" version = "<version>" region = <region> resource_prefix = "" } # For Google: output "instance_scraper_credentials_file" { value = module.my-credentials.instance_scraper_credentials_file } # For AWS: output "instance_scraper_access_key_id" { value = module.my-credentials.instance_scraper_access_key_id } output "instance_scraper_access_secret_key" { value = module.my-credentials.instance_scraper_access_secret_key }
Just as with auto scaling, you use the credentials
submodule to get properly configured credentials in the Terraform outputs. When applied, this will yield something like this:
# For AWS: instance_scraper_access_key_id = <THE_ACCESS_KEY_TO_CONFIGURE> instance_scraper_access_secret_key = <THE_SECRET_KEY_TO_CONFIGURE> # For Google: instance_scraper_credentials_file = <THE_CREDENTIALS_FILE_CONTENT>
Now we can use these credentials for the different cloud providers.
Credentials need to be added to the Prometheus container of your Sourcegraph deployment and a new scrape job needs to be added.
In a Kubernetes deployment, credentials can be added by mounting the credentials file obtained from the credentials
module in the last step, and pointing to it from an environment variable.
Step 1: Create a secret called prometheus-secrets
containing the credentials file content:
apiVersion: v1 kind: Secret type: Opaque metadata: name: prometheus-secrets data: # The Terraform output for `instance_scraper_credentials_file` GCP_ACCOUNT_JSON: <THE_CREDENTIALS_FILE_CONTENT>
Step 2: Modify the Prometheus deployment manifest:
containers: - name: prometheus # [...] env: - name: GOOGLE_APPLICATION_CREDENTIALS value: /credentials/google_application_credentials.json volumeMounts: - mountPath: /credentials/google_application_credentials.json name: credentials subPath: google_application_credentials.json readOnly: true volumes: - name: credentials secret: secretName: prometheus-secrets items: - key: GCP_ACCOUNT_JSON path: google_application_credentials.json
Step 3: Add the following scraping job that uses GCE service discovery configuration to the Prometheus configuration. To do that, you can edit the Prometheus ConfigMap
and modify the contents of the prometheus.yml
file. Under scrape_configs:
add the following and make sure to replace {GCP_PROJECT}
, {GCP_ZONE}
and {INSTANCE_TAG}
. The {INSTANCE_TAG}
value must be the same as instance_tag
.
- job_name: 'sourcegraph-executors' metrics_path: /proxy params: module: [executor] gce_sd_configs: &executor_gce_config - project: {GCP_PROJECT} # Change this to the GCP project ID port: 9999 zone: {GCP_ZONE} # Change this to the GCP zone filter: '(labels.executor_tag = {INSTANCE_TAG})' # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules relabel_configs: &executor_relabel_config - source_labels: [__meta_gce_public_ip] target_label: __address__ replacement: "${1}${2}:9999" separator: '' - source_labels: [__meta_gce_zone] regex: ".+/([^/]+)" target_label: zone separator: '' - source_labels: [__meta_gce_project] target_label: project - source_labels: [__meta_gce_instance_name] target_label: instance separator: '' - regex: "__meta_gce_metadata_(image_.+)" action: labelmap - job_name: 'sourcegraph-executor-nodes' metrics_path: /proxy params: module: [node] gce_sd_configs: *executor_gce_config relabel_configs: *executor_relabel_config # If you've also used the Terraform modules to provision Docker registry # mirrors for executors: - job_name: 'sourcegraph-executors-docker-registry-mirrors' metrics_path: /proxy params: module: [registry] gce_sd_configs: &gce_executor_mirror_config - project: {GCP_PROJECT} # Change this to the GCP project ID port: 9999 zone: {GCP_ZONE} # Change this to the GCP zone filter: '(labels.executor_tag = {INSTANCE_TAG}-docker-mirror)' # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules relabel_configs: *executor_relabel_config - job_name: 'sourcegraph-executors-docker-registry-mirror-nodes' metrics_path: /proxy params: module: [node] gce_sd_configs: *gce_executor_mirror_config relabel_configs: *executor_relabel_config
Step 4: Restart Prometheus.
If you currently have any executors or Docker registry mirrors running, you should start seeing metrics on the Executors dashboard in Grafana. Alternatively, you can check if the executors can be scraped, by port-forwarding the Prometheus UI to your local machine and checkin in the UI.
AWS
Credentials need to be added to the Prometheus container of your Sourcegraph deployment and a new scrape job needs to be added.
In a Kubernetes deployment, credentials can be added by setting the two secrets obtained from the credentials
module in the last step as environment variables.
Step 1: Modify the Prometheus deployment manifest:
containers: - name: prometheus # [...] env: - name: AWS_ACCESS_KEY_ID # The Terraform output for `instance_scraper_access_key_id` value: <THE_ACCESS_KEY_TO_CONFIGURE> - name: AWS_SECRET_ACCESS_KEY # The Terraform output for `instance_scraper_access_secret_key` value: <THE_SECRET_KEY_TO_CONFIGURE>
Step 2: Add the following scraping job that uses EC2 service discovery configuration to the Prometheus configuration. To do that, you can edit the Prometheus ConfigMap
and modify the contents of the prometheus.yml
file. Under scrape_configs:
add the following and make sure to replace {AWS_REGION}
and {INSTANCE_TAG}
. The {INSTANCE_TAG}
value must be the same as instance_tag
.
- job_name: 'sourcegraph-executors' metrics_path: /proxy params: module: [executor] ec2_sd_configs: &executor_ec2_config - region: {AWS_REGION} # Change this to the AWS region port: 9999 filters: - name: tag:executor_tag values: [{INSTANCE_TAG}] # Change {INSTANCE_TAG} to the `executor_instance_tag` set in the Terraform modules relabel_configs: &executor_relabel_config - source_labels: [__meta_ec2_public_ip] target_label: __address__ replacement: "${1}${2}:9999" separator: '' - source_labels: [__meta_ec2_availability_zone] regex: ".+/([^/]+)" target_label: zone separator: '' - source_labels: [__meta_ec2_instance_id] target_label: instance separator: '' - source_labels: [__meta_ec2_ami] target_label: version - job_name: 'sourcegraph-executor-nodes' metrics_path: /proxy params: module: [node] ec2_sd_configs: *executor_ec2_config relabel_configs: *executor_relabel_config # If you've also used the Terraform modules to provision Docker registry # mirrors for executors: - job_name: 'sourcegraph-executors-docker-registry-mirrors' metrics_path: /proxy params: module: [registry] ec2_sd_configs: &ec2_executor_mirror_config - region: {AWS_REGION} port: 9999 filters: - name: tag:executor_tag values: [{INSTANCE_TAG}-docker-mirror] relabel_configs: *executor_relabel_config - job_name: 'sourcegraph-executors-docker-registry-mirror-nodes' metrics_path: /proxy params: module: [node] ec2_sd_configs: *ec2_executor_mirror_config relabel_configs: *executor_relabel_config
Step 3: Restart Prometheus.
If you currently have any executors or Docker registry mirrors running, you should start seeing metrics on the Executors dashboard in Grafana. Alternatively, you can check if the executors can be scraped, by port-forwarding the Prometheus UI to your local machine and checkin in the UI.