Install Sourcegraph with Docker Compose on AWS

This guide shows you how to deploy Sourcegraph via Docker Compose to a single EC2 instance on AWS.

Determine server and service requirements

Use the resource estimator to determine the resource requirements for your environment. You will use this information to set up the instance and configure the docker-compose YAML file.

Prepare a fork

We strongly recommend that you create and run Sourcegraph from your own fork of the reference repository. You will make changes to the default configuration, for example to the docker-compose YAML file, in your fork. The fork will also enable you to keep track of your customizations when upgrading your fork from the reference repo. Refer to the following steps for preparing a clone, which use GitHub as an example, then return to this page:

  1. Fork the reference repo
  2. Clone your fork
  3. Configure the release branch
  4. Configure the YAML file
  5. Publish changes to your branch

Deploy to EC2

  • Click Launch Instance from your EC2 dashboard.
  • Select the Amazon Linux 2 AMI (HVM), SSD Volume Type.
  • Select an appropriate instance size (use the resource estimator to find a good starting point for your deployment), then click Next: Configure Instance Details.
  • Ensure the Auto-assign Public IP option is set to "Enable". This ensures your instance is accessible to the Internet.
  • Place the following script in the User Data text box at the bottom of the Configure Instance Details page

Screen Shot 2021-12-28 at 1 05 07 PM

#!/usr/bin/env bash

set -euxo pipefail

EBS_VOLUME_DEVICE_NAME='/dev/sdb'
DOCKER_DATA_ROOT='/mnt/docker-data'

DOCKER_COMPOSE_VERSION='1.29.2'
DEPLOY_SOURCEGRAPH_DOCKER_CHECKOUT='/home/ec2-user/deploy-sourcegraph-docker'

# 🚨 Update these variables with the correct values from your fork!
DEPLOY_SOURCEGRAPH_DOCKER_FORK_CLONE_URL='https://github.com/sourcegraph/deploy-sourcegraph-docker.git'
DEPLOY_SOURCEGRAPH_DOCKER_FORK_REVISION='v3.43.2'

# Install git
yum update -y
yum install git -y

# Clone Docker Compose definition
git clone "${DEPLOY_SOURCEGRAPH_DOCKER_FORK_CLONE_URL}" "${DEPLOY_SOURCEGRAPH_DOCKER_CHECKOUT}"
cd "${DEPLOY_SOURCEGRAPH_DOCKER_CHECKOUT}"
git checkout "${DEPLOY_SOURCEGRAPH_DOCKER_FORK_REVISION}"

# Format (if necessary) and mount EBS volume
device_fs=$(lsblk "${EBS_VOLUME_DEVICE_NAME}" --noheadings --output fsType)
if [ "${device_fs}" == "" ] ## only format the volume if it isn't already formatted
then
  mkfs -t xfs "${EBS_VOLUME_DEVICE_NAME}"
fi
mkdir -p "${DOCKER_DATA_ROOT}"
mount "${EBS_VOLUME_DEVICE_NAME}" "${DOCKER_DATA_ROOT}"

# Mount EBS volume on reboots
EBS_UUID=$(blkid -s UUID -o value "${EBS_VOLUME_DEVICE_NAME}")
echo "UUID=${EBS_UUID}  ${DOCKER_DATA_ROOT}  xfs  defaults,nofail  0  2" >> '/etc/fstab'
umount "${DOCKER_DATA_ROOT}"
mount -a

# Install, configure, and enable Docker
yum update -y
amazon-linux-extras install docker
systemctl enable --now docker
sed -i -e 's/1024/262144/g' /etc/sysconfig/docker
sed -i -e 's/4096/262144/g' /etc/sysconfig/docker
usermod -a -G docker ec2-user

# Install jq for scripting
yum install -y jq

# Edit Docker storage directory to mounted volume
DOCKER_DAEMON_CONFIG_FILE='/etc/docker/daemon.json'

## initialize the config file with empty json if it doesn't exist
if [ ! -f "${DOCKER_DAEMON_CONFIG_FILE}" ]
then
  mkdir -p $(dirname "${DOCKER_DAEMON_CONFIG_FILE}")
  echo '{}' > "${DOCKER_DAEMON_CONFIG_FILE}"
fi

## update Docker's 'data-root' to point to our mounted disk
tmp_config=$(mktemp)
trap "rm -f ${tmp_config}" EXIT
cat "${DOCKER_DAEMON_CONFIG_FILE}" | jq --arg DATA_ROOT "${DOCKER_DATA_ROOT}" '.["data-root"]=$DATA_ROOT' > "${tmp_config}"
cat "${tmp_config}" > "${DOCKER_DAEMON_CONFIG_FILE}"

## finally, restart Docker daemon to pick up our changes
systemctl restart --now docker

# Install Docker Compose
curl -L "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
curl -L "https://raw.githubusercontent.com/docker/compose/${DOCKER_COMPOSE_VERSION}/contrib/completion/bash/docker-compose" -o /etc/bash_completion.d/docker-compose

# Run Sourcegraph. Restart the containers upon reboot.
cd "${DEPLOY_SOURCEGRAPH_DOCKER_CHECKOUT}"/docker-compose
docker-compose up -d
  • Select Next: Add Storage

  • Click "Add New Volume" and add an additional volume (for storing Docker data) with the following settings:

    • Volume Type (left-most column): EBS
    • IMPORTANT: Device: /dev/sdb
    • Size (GiB): 250 GB minimum (As a rule of thumb, Sourcegraph needs at least as much space as all your repositories combined take up. Allocating as much disk space as you can upfront helps you avoid resizing your volume later on.)
    • Volume Type: General Purpose SSD (gp2)
    • Delete on Termination: Leave this setting unchecked
  • Select Next: ... until you get to the Configure Security Group page. Then add the following rules:

  • Launch your instance, then navigate to its public IP in your browser. (This can be found by navigating to the instance page on EC2 and looking in the "Description" panel for the "IPv4 Public IP" value.) You may have to wait a minute or two for the instance to finish initializing before Sourcegraph becomes accessible. You can monitor the status by SSHing into the instance and using the diagnostic commands:
# Follow the status of the user data script you provided earlier
tail -f /var/log/cloud-init-output.log

# (Once the user data script completes) monitor the health of the "sourcegraph-frontend" container
docker ps --filter="name=sourcegraph-frontend-0"

Update your Sourcegraph version

Refer to the Docker Compose upgrade docs.

Storage and Backups

The Sourcegraph Docker Compose definition uses Docker volumes to store its data. The previous script configures Docker to store all Docker data on the additional EBS volume that was attached to the instance (mounted at /mnt/docker-data - the volumes themselves are stored under /mnt/docker-data/volumes) There are a few different ways to backup this data:

  • (recommended) The most straightforward method to backup this data is to snapshot the entire /mnt/docker-data EBS disk on an automatic, scheduled basis.

  • Using an external Postgres instance lets a service such as AWS RDS for PostgreSQL take care of backing up all of Sourcegraph's user data for you. If the EC2 instance running Sourcegraph ever dies or is destroyed, creating a fresh instance that's connected to that external Postgres will leave Sourcegraph in the same state that it was before.