Current limitations of Code Insights

There are a few existing limitations.

If you have strong feedback, please do let us know.

Limitations that are no longer current are documented at the bottom for the benefit of customers who have not yet upgraded.

Insight chart position and size do not persist

You can resize and reorder charts on the dashboard for the purpose of taking a screenshot or presenting information, but that order will revert on a page refresh.

If the ordering of insights is important, you can remove and then re-add the insights in the order you'd like via the add/remove insights to dashboard flow.

If the size is important, you can use the single-insight view page to consistently view an insight at a larger size, reachable by clicking the insight title or from the context three dots menu on the insight card under "Get shareable link".

Performance speed considerations for a data series running over all repositories

To accurately return historical data for insights running over all of your repositories, the backend service must run a large number of Sourcegraph searches. This means that unlike code insights running over just a few repositories, results are not returned instantly, but more often on the scale of 20-120 minutes, depending on:

N: how many repositories you have connected to your instance; in our tests, we used 26,400 repositories
q: the performance and resources of your Sourcegraph code insights instance in queries-per-second; in our tests, 7 queries per second was average
c: how well we can "compress" repositories so we don't need to re-run queries every month (e.g., if a repository hasn't changed in two months); in our tests, C = ~2

A very general formula for estimating how long an individual data series (query) will take to run on your instance in seconds N * 1/c * 1/q.

On our test instance, we find a code insight data series takes approximately:

26,400 repositories * 1/2 compression factor * 1/7 queries per second = 31 minutes

The number of insights you have does not affect the overall speed at which they run: it will take the same total time to run all of them whether or not you let each one finish before creating the next one. Insights currently populate in parallel, prioritizing most-recent-in-time datapoints first.

Creating insights over very large repositories (<3.42)

In some cases, depending on the size of the Sourcegraph instance and the size of the repo, you may see odd behavior or timeout errors if you try to create a code insight running over a single large repository. In this case, it's best to try:

Create the insight, but check the box to "run over all repositories." (This sends the Insight backfilling jobs to the backend Sourcegraph instance worker which will handle them datapoint-by-datapoint. Running over an individual repository otherwise currently runs the jobs in bulk to generate its live preview.)
After the insight has finished running, filter the insight to the specific repo you originally wanted to use. The filter resolves instantly.

If this does not solve your problem, please reach out directly to your Sourcegraph contact or in your shared slack channel, as there are experimental solutions we are currently working on to further improve our handling of large repositories.

Accuracy considerations for an insight query returning a large result set

If you create an insight with a search query that returns a large result set that exceeds the search timeout (generally when there are over 1,000,000 results), non-historical data points may report undercounted numbers. This behaviour is tracked in this issue. This is because non-historical data points are recorded with a global search query as opposed to per-repo queries we run for backfilling. For a large result set (e.g. a query for test with millions of results) the global query will be disadvantaged by the global search timeout. You can find more information on search timeouts in the docs.

You can determine if this issue may be affecting your query by just running the query in the Search UI on /search with a count:all – if your search is returning x results in 60s (or the upper limit max timeout is configured to) then the search will time out on insights as well. Note that the duration could be more or less 60s, e.g. you could encounter 60.02s as well.

In this case, you may want to try:

Using a more granular query
Changing your site configuration so that the timeout is increased, provided your instance setup allows it. More information on timeouts.

General scale limitations

Note: We are working on improvements for the items below in FY23Q4.

Code Insights is disabled by default on single-docker deployment methods.

There are a few factors to consider with respect to scale and expected performance.

General permissiveness - instances that are more open (users can see most repos) will perform better than instances that are more restricted. It is possible to have enough restricted repositories that users cannot render Code Insights.
Number of repositories - Code Insights is well tested to ~35,000 repositories. Users should expect at least linear degredation as repository count grows in both time to calcluate insights, and render performance.
Large monorepos - Code Insights allocates a fixed amount of time for each query, so large repositories that cause query timeouts will likely not have exhaustive (and therefore accurate) results. Until we add more visibility to this state, a heuristic indicator for if this is a problem is seeing values "jump" (either a significant increase or decrease) between the backfilled datapoints on creation and the up-to-date datatpoints added after creation.
High cardinaltiy capture groups - When using a capture group insight, high cardinality matches (for example 1000 distinct matches per repository) will cause significant increase in loading times of charts. It is possible to exceed request timeouts if there are too many distinct matches.
Concurrent usage
If there are many insight creators the insights will take longer to calculate.
If there are more insight viewers loading times of charts may be impacted.

Creating insights over specific branches and revisions

Code Insights does not yet support running over specific revisions.

Feature parity limitations

Features currently available only on insights over all your repositories

Filtering insights: available in 3.41+ ~~we do not yet allow filtering for insights that run over explicitly defined lists of repositories, except for "detect and track" insights.~~

Features currently available only on insights over explicitly defined repository lists

Because these insights need to run dramatically fewer queries than insights over thousands of repositories, you will have access to a number of features not yet supported for insights over all repositories. These are:

Live previews: showing the preview of your insight in real time
[Released] Dynamic x-axis ranges: available in 3.35+ ~~set a custom amount of historical data you care about~~
[Released] Editing data series queries after creation: available in 3.35+ ~~for insights over all repositories, you must make a new insight if you wish to run a different query~~
[Released] "Diff click": available in 3.36+ ~~click a datapoint on your insight and be taken to a diff search showing any changes contributing to the difference between a datapoint and the prior one~~

Limitations specific to "Detect and track patterns" insights (automatically generated data series)

Please see Current limitations of automatically generated data series.

In certain cases, chart datapoints don't match the result count of a Sourcegraph search

There are currently a few subtle differences in how code insights and Sourcegraph web app searches handle defaults when searching over all repositories. Refer to Common reasons code insights may not match search results.

Known bugs

Known bugs we plan to fix are tracked in our GitHub repository here.

Older versions' limitations

Version 3.30 (July 2021) or older

Search-based Code Insights can only run over ~50-70 repositories

Because this version of the prototype runs on frontend API calls to Sourcegraph searches, it may run slowly (or possibly timeout) if you're using it over many repositories or with many data series for each insight.

The max match count is 5,000 matches per repository

The current limit on searching over historical versions of repositories, which is an unindexed search, is 5,000 results per repository. If there are more than 5,000 matches, the search stops and returns a count of 5,000, and the code insight graph will calculate the overall chart using 5,000 as the match count for that repository. (This means if you query over two repositories and one of them hits this limit, the value shown on the graph will be 5,000 + [the match count in the other repository]).

This limit was lifted in the August 2021 release of Sourcegraph 3.31