How precise code intelligence queries are resolved

Precise code intelligence results are obtained by making GraphQL requests to the frontend service. The code intelligence extensions are example consumer of this API, and its documentation details how code intelligence results are used.

Definitions

A definitions request returns the set of locations that define the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset). The sequence of actions required to resolve a definitions query is shown below (click to enlarge).

First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of git diff between the two commits is used to adjust the input path and line number.

The adjusted path and position is used to query the definitions at that position using the selected upload data. If a definition is local to the upload, the LSIF store can resolve the query without any additional data. If the definition is remote (defined in a different root of the same repository, or defined in a different repository), the import monikers of the symbol at the adjusted path and position in the selected upload are determined, as are the package information data of those monikers. Using an upload that provides one of the selected packages, definitions of the associated moniker are queried from the codeintel database.

Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, git diff is used to again re-adjust the results to the target commit.

Code appendix

References

A references request returns the set of locations that reference the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset). Unlike the set of definitions, which should generally have only member, the set of references can unbounded for popular repositories. The resolution of references is therefore done in chunks, allowing the user to request reference results page-by-page. The sequence of actions required to resolve a references query is shown below (click to enlarge).

First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of git diff between the two commits is used to adjust the input path and line number.

A references request optionally supplies a cursor that encodes the state of the previous request (the first request supplies no cursor). If a cursor is supplied, it is decoded and validated. Otherwise, one is created with the input repository, commit, adjusted path, adjusted position, the selected upload identifier, and the monikers of the symbol at the adjusted path and position in the selected, upload. Note that this step may be repeated over multiple uploads: each upload returned in the previous step will have its own cursor, encoded/decoded independently at the GraphQL resolver layer.

The cursor decoded or created above is used to drive the resolution of the current page of results. While the number of results in the current page is less than the requested number of results, another batch of locations is requested using the current cursor and append it to the current page. Resolving a page also returns a new cursor. This cursor is ultimately sent back to the client so they can make a subsequent request, and is also used as the new current cursor if a subsequent batch of locations is requested.

Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, git diff is used to again re-adjust the results to the target commit.


The sequence of actions required to resolve a page of references given a cursor is shown below (click to enlarge).

The cursor can be in one of five phases, ordered as follows. Each phase handles a distinct segment of the result set. A phase may return no results, or it may return multiple pages worth of results. In the later case, the cursor encodes sufficient information (e.g. number of uploads, references previously returned in the phase) to be able to skip duplicate results.

  1. The sameDumpCursor phase retrieves reference results from the upload in which the target symbol is indexed. This phase will return local references to symbols defined in the same upload. This phase will also, for some but not all indexer output, return references to remote symbols.
  2. The sameDumpMonikersCursor phase retrieves reference results by the moniker of the target symbol from the upload in which the target the symbol is indexed. This excludes the reference results that are returned from the previous phase. This phase is necessary as not all indexer output uniquely correlates the references of symbols defined externally.
  3. The definitionMonikersCursor phase retrieves reference results by moniker from the upload in which the symbol definition is indexed (if it is distinct from the upload in which the target symbol is indexed).
  4. The sameRepoCursor phase retrieves references results by moniker from all uploads for the same repository. This includes uploads only for roots which are distinct from the root of the upload in which the target symbol is indexed. This handles results from large repositories that are split into multiple, separately-indexed projects.
  5. The remoteRepoCursor phase retrieves reference results by moniker from all uploads for distinct repositories. This enables true cross-repository reference results.

Code appendix

Hover

A hover request returns the hover text associated with the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset), as well as the range of the hovered symbol. The sequence of actions required to resolve a hover query is shown below (click to enlarge).

First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of git diff between the two commits is used to adjust the input path and line number.

The adjusted path and position is used to query the hover at that position using the selected upload data. For most indexers, this is enough to completely resolve the hover data; we have, however, seen indexes in which cross-repository symbols do not link their hover text correctly. In these cases, the definition of the symbol at the same location is determined, and another hover query is performed using the definition symbol's position (if exactly one such definition is found).

Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, git diff is used to again re-adjust the results to the target commit.

Code appendix