Back up or migrate Sourcegraph data to a new instance
In some circumstances it may be necessary or advantageous to migrate from one Sourcegraph instance or deployment to another. This page describes how to execute such a migration.
Specific guides
Data stores
While much of Sourcegraph's data can be regenerated, some state can be stored in multiple locations.
Configuration JSON
Most parts of Sourcegraph's configuration are managed in the webapp via text editors. These files are typically stored in the Postgres database (described below), but are translated into text for editing in the web UI.
These files are the most essential pieces of information required for a migration to work.
Data | Can it be recreated without a backup? | Notes |
---|---|---|
Site configuration | No | This file contains key configuration that defines how the product works. |
Code host connection configuration(s) | No | Each connection to an external code host has its own short configuration file. |
Global settings | No | Default settings can be set by administrators for all users by editing this file. |
Backing up this data is as simple as copy-pasting the text from the files described above on the old Sourcegraph instance into the new one.
Internal database (Postgresql)
Sourcegraph's internal database houses most of Sourcegraph's state. While many of these pieces of data can be restored after a migration, some cannot.
This list is not guaranteed to be complete, but rather representative of the types of data stored here.
Data | Can it be recreated without a backup? | Notes |
---|---|---|
Repository metadata (e.g. clone URLs, whether it is a fork or archive, etc.) | Yes | |
User accounts | Yes (if using SSO authentication), No if using builtin authentication | |
Private Sourcegraph extensions | Yes | Only used if your Sourcegraph instance uses a private extension registry. Ensure that any private extension code is backed up. |
Repository permissions | Yes | |
Organizations | No | |
User and org settings | No | Global settings can be backed up as described above, but user- and org-level settings cannot. |
Saved searches | No | |
User-generated access tokens | No | |
Batch Changes | No | |
Code graph metadata | Yes (if manually regenerated) | This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default. |
User survey responses | No | |
Usage statistics and event logs | No | Event logs allow admins to track and audit usage, but are not necessary for Sourcegraph to work |
Data stored on disk
Git data, search indexes, precise code-intel data, Prometheus metrics, and some other large data sources are stored on disk.
This list is not guaranteed to be complete, but rather representative of the types of data stored here.
Data | Can it be recreated without a backup? | Notes |
---|---|---|
Repository (git) data | Yes | |
Search indexes | Yes | |
Code graph data | No | This can be regenerated by re-running the indexing and upload process for affected repositories and revisions, but will not be regenerated by default. |
Prometheus metrics | No | |
MinIO | Yes | This is where unprocessed uploads are stored. |
Ephemeral data (Redis)
Short-lived data, including session data and some usage statistics, are stored in Redis. This data can all be recreated without backups.
External data
Certain categories of data can be stored outside of the Sourcegraph deployment. For example, configuration JSON files can be loaded from disk, and Sourcegraph can connect to external services (PostgreSQL, Redis, S3/GCS) instead of using PostgreSQL, Redis, and MinIO internally.
In these cases, no migration should be necessary—simply re-use the existing external data sources on the new Sourcegraph instance.
Migration and backup options
Option 1: Configuration only
The easiest option is to simply back up or migrate configuration JSON data. Simply back up (by copying) the configuration files listed above and they can be pasted into a new Sourcegraph instance's UI after startup.
Option 2: All Postgres data
This option provides a more complete backup, and ensures that almost all state will be restored. Repositories will have to be recloned and reindexed, so some downtime will be required while these oprations complete.
Follow the instructions in our Docker to Docker Compose migration guide to generate a dump of Sourcegraph's Postgres database. Contact us for specific recommendations for your deployment type.
Option 3: All data
Backing up all persistent volumes is the most complete option. Instructions for doing this depends on the deployment method and the cloud host. Contact us to discuss more.
Persistent data backup in Kubernetes
Please use the below table for reference when migrating your data from a Kubernetes Cluster:
Name | Recreatable | Notes |
---|---|---|
codeinsight-db | Yes | |
codeintel-db | Yes | While the data is recreateable, we suggest including the disk during your migration as it often contains a lot of data that would take awhile to regenerate |
indexed-search | Yes | |
gitserver | Yes | |
grafana | Yes | |
minio | Yes | |
pgsql | NO | This is the main database of Sourcegraph where most of the data are stored |
prometheus | YES | |
redis-cache | YES | |
redis-store | YES |