Adding, changing and debugging pings
This page outlines the process for adding or changing the data collected from Sourcegraph instances through pings.
Ping philosophy
Pings are the only data Sourcegraph receives from installations. Our users and customers trust us with their most sensitive data. We must preserve and build this trust through only careful additions and changes to pings.
All ping data must be:
- Anonymous (with only one exception—the email address of the initial site installer)
- Aggregated (e.g. number of times a search filter was used per day, instead of the actual search queries)
- Non-specific (e.g. no repo names, no usernames, no file names, no specific search queries, etc.)
Adding data to pings
Treat adding new data to pings as having a very high bar. Would you be willing to send an email to all Sourcegraph users explaining and justifying why we need to collect this additional data from them? If not, don’t propose it.
-
Write an RFC describing the problem, data that will be added, and how Sourcegraph will use the data to make decisions. sourcegraph/bizops must be a required reviewer. Please include the following information RFC:
- What are the exact data fields you're requesting to add?
- What are the exact questions you're trying to answer with this new data? Why can't we use existing data to answer them?
- How does the JSON payload look once those fields are added?
The RFC should also include answers to these questions (if applicable):
- Why was this particular metric/data chosen? What business problem does collecting this address?
- What specific product or engineering decisions will be made by having this data?
- Will this data be needed from every single installation, or only from a select few?
- Will it be needed forever, or only for a short time? If only for a short time, what is the criteria and estimated timeline for removing the data point(s)?
- Have you considered alternatives? E.g., collecting this data from Sourcegraph.com, or adding a report for admins that we can request from some number of friendly customers?
These RFCs are great examples: Adding code host versions to Pings and Adding Sourcegraph Extensions Usage Metrics to Pings.
-
When the RFC is approved, use the life of a ping documentation with help of an example PR to implement the change. At least one member of the BizOps team must approve the resulting PR before it can be merged. DO NOT merge your PR yet. Steps 3, 4, and 5 must be completed before merging.
- Ensure a CHANGELOG entry is added, and that the two sources of truth for ping data are updated along with your PR:
- Pings documentation: https://docs.sourcegraph.com/admin/pings
- The Site-admin > Pings page, e.g.: https://sourcegraph.com/site-admin/pings
- Ensure a CHANGELOG entry is added, and that the two sources of truth for ping data are updated along with your PR:
-
Determine if any transformations/ETL jobs are required, and if so, add them to the script. The script is primarily for edge cases. Primarily, as long as zeroes or nulls are being sent back instead of
""
in the case where the data point is empty. -
Open a PR to change the schema with sourcegraph/bizops as approvers. Note: we have a 3 business day SLA to test and merge the production schema to properly test before branch cuts. Keep in mind:
- Check the data types sent in the JSON match up with the BigQuery schema (e.g. a JSON '1' will not match up with a BigQuery integer).
- Every field in the BigQuery schema should not be non-nullable (i.e.
"mode": "NULLABLE"
and"mode": "REPEATED"
are acceptable). There will be instances on the older Sourcegraph versions that will not be sending new data fields, and this will cause pings to fail.
-
Once the schema change PR is merged, test the new schema. Contact sourcegraph/bizops (#data-eng-ops or #analytics) for this part.
- Delete the test table (
$DATASET.$TABLE_test
), create a new table with the same name (update_checks_test
), and then upload the schema with the newest version (see "Changing the BigQuery schema" for commands). This is done to wipe the data in the table and any legacy configurations that could trigger a false positive test, but keep the connection with Pub/Sub.- Update and publish a message to Pub/Sub, which will go through Dataflow to the BigQuery test table. The message can use this example as a baseline, and add sample data for the new ping data points.
- To see if it worked, go to the
update_checks_test
table, and run a query against it checking for the new data points. Messages that fail to publish are added to the error records table.
- Merge the PR
Changing the BigQuery schema
Commands:
- To update schema:
bq --project_id=$PROJECT update --schema $SCHEMA_FILE $DATASET.$TABLE
, replacing$PROJECT
with the project ID,$SCHEMA_FILE
with the path to the schema JSON file generated above, and$DATASET.$TABLE
with the dataset and table name, separated by a dot. - To retrieve the current schema :
bq --project_id=$PROJECT --format=prettyjson show $DATASET.$TABLE > schema.json
with the same replacements as above.
To update the schema:
- Run the update schema command on a test table.
- Once the test is complete, run the update schema command on the production table.
Changing the BigQuery scheduled queries
- Add the fields you'd like to bring into BigQuery/Looker to the instances scheduled queries 1 and 2.
- If day-over-day (or similar) data is necessary, create a new table/scheduled query. For example, daily active users needs a separate table and scheduled query.
Debugging pings
Options for debugging ping abnormalities. Refer to life of a ping for the steps in the ping process.
- BigQuery: Query the update_checks error records and/or check the latest pings received based on installer email admin.
- Dataflow: Review Dataflow: WriteSuccessfulRecords should be full of throughputs and the Failed/Error jobs should be empty of throughputs.
- Stackdriver (log viewer): Check the frontend logs, which contain all pings that come through Sourcegraph.com. Use the following the advanced filters to find the pings you're interested in.
- Grafana: Run
src_updatecheck_client_duration_seconds_sum
on Grafana to understand how long each method is taking. Request this information from an instance admin, if necessary. - Test on a Sourcegraph dev instance to make sure the pings are being sent properly
resource.type="k8s_container" resource.labels="dot-com" resource.labels.cluster_name="prod" resource.labels.container_name="frontend" "[COMPANY]" AND "updatecheck"