Auto-indexing inference configuration reference
This document details how a site administrator can supply a Lua script to customize the way Sourcegraph detects precise code intelligence indexing jobs from repository contents.
By default, Sourcegraph will attempt to infer index jobs for the following languages:
Go
Java
/Scala
/Kotlin
Python
Ruby
Rust
TypeScript
/JavaScript
Inference logic can be disabled or altered in the case when the target repositories do not conform to a pattern that the Sourcegraph default inference logic recognizes. Inference logic is controlled by a Lua override script that can be supplied in the UI under Admin > Code graph > Inference
.
Example
The Lua override script ultimately must return an auto-indexing config object. A configuration that neither disables or adds new recognizers does not change the default inference behavior.
return require("sg.autoindex.config").new({ -- Empty configuration (see below for usage) })
To disable default behaviors, you can re-assign a recognizer value to false
. Each of the built-in recognizers are prefixed with sg.
(and are the only ones allowed to be).
return require("sg.autoindex.config").new({ -- Disable default Python inference ["sg.python"] = false })
To add additional behaviors, you can create and register a new recognizer. A recognizer is an interface that requests some set of files from a repository, and returns a set of auto-indexing job configurations that could produce a precise code intelligence index.
A path recognizer is a concrete recognizer that advertises a set of path globs it is interested in, then invokes its generate
function with matching paths from a repository. In the following, all files matching Snek.module
(Snek.module
, proj/Snek.module
, proj/sub/Snek.module
, etc) are passed to a call to generate
(if non-empty). The generate function will then return a list of indexing job descriptions. The guide for auto-indexing jobs configuration gives detailed descriptions on the fields of this object.
local path = require("path") local pattern = require("sg.autoindex.patterns") local recognizer = require("sg.autoindex.recognizer") local snek_recognizer = recognizer.new_path_recognizer { patterns = { -- Look for Snek.module files -- (would match Snek.module; proj/Snek.module, proj/sub/Snek.module, etc) pattern.new_path_basename("Snek.module"), -- Ignore any files in test or vendor directories pattern.new_path_exclude( pattern.new_path_segment("test"), pattern.new_path_segment("vendor") ), }, -- Called with list of matching Snek.module files generate = function(_, paths) local jobs = {} for i = 1, #paths do -- Create indexing job description for each matching file table.insert(jobs, { indexer = "acme/snek:latest", -- Run this indexer... root = path.dirname(paths[i]), -- ...in this directory local_steps = {"snekpm install"}, -- Install dependencies indexer_args = {"snek", "index", ".", "--output", "index.scip"}, outfile = "index.scip", }) end return jobs end } return require("sg.autoindex.config").new({ -- Register new recognizer ["acme.snek"] = snek_recognizer, })
Available libraries
There are a number of specific and general-purpose Lua libraries made accessible via the built-in require
.
The type signatures for the functions below use the following syntax:
(A1, ..., An) -> R
: Function type with arguments of typeA1, ..., An
and return typeR
.array[A]
: Table with indexes 1 to N of elements of typeA
.table[K, V]
: Table with keys of typeK
and values of typeV
.A | B
: Union type (includes values of typeA
and typeB
).A...
: Variadic (0 or more values of A, without being wrapped in a table)."mystring"
: Literal string type with only"mystring"
as the allowed value.{K1: V1, K2: V2, ...}
: Heterogenous table (object) with a key of typeK1
mapping to a value of typeV1
etc.void
: no value returned from function
sg.autoindex.recognizer
This auto-indexing-specific library defines the following two functions.
-
new_path_recognizer
creates aRecognizer
from a config object containingpatterns
andgenerate
fields. See the example above for basic usage.- Type:
({ "patterns": array[pattern], "patterns_for_content": array[pattern], "generate": (registration_api, paths: array[string], contents_by_path: table[string, string]) -> array[index_job], }) -> recognizer
whereindex_job
is an object with the following shape:index_job = { "indexer": string, -- Docker image for the indexer "root": string, -- working directory for invoking the indexer "steps": array[{ -- preparatory steps to run before invoking the indexer (e.g. installing dependencies) "root": string, -- working directory for this step "image": string -- Docker image to use for preparatory step "commands": array[string] -- List of commands to run inside the Docker image }], "local_steps": array[string] -- List of commands to run inside the indexer image at "root" before invoking -- the indexer (e.g. to install dependencies) "indexer_args": array[string], -- command-line invocation for the indexer "outfile": string, -- path to the index generated by the indexer "requested_envvars": array[string], -- List of environment variables needed. These are made accessible -- to steps, local_steps, and the indexer_args command. }
For installing dependencies, if the indexer image contains the relevant package manager(s), then it is simpler to install dependencies usinglocal_steps
. Otherwise, thesteps
field allows more customizability.
- Type:
-
new_fallback_recognizer
creates arecognizer
from an ordered list ofrecognizer
s. Eachrecognizer
is called sequentially, until one of them emits non-empty results.- Type:
(array[recognizer]) -> recognizer
- Type:
The registration_api
object has the following API:
register
which queues arecognizer
to be run at a later stage. This makes it possible to add more recognizers dynamically, such as based on whether specific configuration files were found or not.- Type:
(recognizer) -> void
- Type:
sg.autoindex.patterns
This auto-indexing-specific library defines the following four path pattern constructors.
new_path_literal(fullpath)
creates apattern
that matches an exact filepath.- Type:
(string) -> pattern
- Type:
new_path_segment(segment)
creates apattern
that matches a directory name.- Type:
(string) -> pattern
- Type:
new_path_basename(basename)
creates apattern
that matches a basename exactly.- Type:
(string) -> pattern
- Type:
new_path_extension(ext_no_leading_dot)
creates apattern
that matches files with a given extension.- Type:
(string) -> pattern
- Type:
This library also defines the following two pattern collection constructors.
new_path_combine(patterns)
creates a pattern collection object (to be used with recognizers) from the given set of pathpattern
s.- Type:
((pattern | array[pattern])...) -> pattern
- Type:
new_path_exclude(patterns)
creates a new inverted pattern collection object. Paths matching thesepattern
s are filtered out from the set of matching filepaths given to a recognizer'sgenerate
function.- Type:
((pattern | array[pattern])...) -> pattern
- Type:
path
This library defines the following utility functions:
ancestors(path)
returns a list{dirname(path), dirname(dirname(path)), ...}
. The last element in the list will be an empty string.- Type:
(string) -> array[string]
- Type:
basename(path)
returns the basename of the given path as defined by Go's filepath.Base.- Type:
(string) -> string
- Type:
dirname(path)
returns the dirname of the given path as defined by Go's filepath.Dir, except that it (1) returns an empty path instead of"."
if the path is empty and (2) removes a leading/
if present.- Type:
string -> string
- Type:
join(path1, path2)
returns a filepath created by joining the given path segments via filepath separator.- Type:
(string, string) -> string
- Type:
split(path)
is a convenience function that returnsdirname(path), basename(path)
.- Type:
(string) -> string, string
- Type:
json
This library defines the following two JSON utility functions:
encode(val)
returns a JSON-ified version of the given Lua object.decode(json)
returns a Lua table representation of the given JSON text.
fun
Lua Functional is a high-performance functional programming library accessible via local fun = require("fun")
. This library has a number of functional utilities to help make recognizer code a bit more expressive.