Telemetry export architecture
This page outlines the architecture and components involved in Sourcegraph's new telemetry export system.
In the lifecycle of an event, events are first stored then exported to Telemetry Gateway.
See testing events for a summary of how to observe your events during development.
Storing events
Once recorded, telemetry events are stored in two places:
- The structured
event_logs
table, for use in admin analytics, translated from the Telemetry Gateway format on a best-effort basis. - The unstructured
telemetry_events_export_queue
table, which stores raw event payloads in Protobuf wire format for export.- This table only retains events until they are marked as exported. Once exported, they are pruned after the duration specified by
TELEMETRY_GATEWAY_EXPORTER_EXPORTED_EVENTS_RETENTION
.
- This table only retains events until they are marked as exported. Once exported, they are pruned after the duration specified by
The "tee" store, including the translation from Telemetry Gateway event schema to the event_logs
table, is implemented in internal/telemetry/teestore
.
Note that before events are stored into telemetry_events_export_queue
, sensitive attributes are stripped - this means that the contents of telemetry_events_export_queue
are exactly what gets exported from an instance.
Exporting events
The telemetrygatewayexporter
running in the worker service spawns a set of background jobs that handle:
- Reporting metrics on the
telemetry_events_export_queue
- Cleaning up already-exported entries in the
telemetry_events_export_queue
- Exporting batches of not-yet-exported entries in the
telemetry_events_export_queue
to the Telemetry Gateway service
When exporting events, we explicitly only mark an event as successfully exported when the Telemetry Gateway returns a response with a particular event's generated ID. This ensures we always export events at least once.
Telemetry Gateway
The Telemetry Gateway is a managed Sourcegraph service that ingests event exports from all Sourcegraph instances, and handles manipulating the events and publishing raw payloads to a Pub/Sub topic.
It exposes a gRPC API defined in telemetrygateway/v1
- see exported events schema.
From the gRPC API, the Telemetry Gateway constructs raw JSON events to publish to a designated Pub/Sub topic that eventually makes its way into BigQuery.
Also see How to set up Telemetry Gateway locally.
For details about live Telemetry Gateway deployments, refer to the handbook Telemetry Gateway page.