Secret Detection Metrics
Overview
This page documents the process a member of the Secure: Secret Detection team should use to add metrics to capture product insights for the features we develop.
You should this guide to help you understand the 2 different types of metrics we can use, when to use each one, and give you a jumpstart in implementing them.
Metrics workflow
In general, the workflow for developing and adding metrics is:
flowchart TD A[Product and team determine desired insights] --> B(Implement metrics) B --> C[Verify metrics are visible in Tableau] C -->D[PDI team helps with creating visualizations]
Creating metrics
For our use cases, we utilize Internal Tracking Events or Database Metrics (formerly Service Ping) depending on the situation:
- Internal Tracking Events are good for capturing events that occur but aren’t stored in the database. E.g., a user clicks a specific button.
- Database metrics are useful when the data you want is stored in the database in some way that can be extracted with the right query. E.g., the nubmer of projects that have a setting enabled.
The Analytics Instrumentation team has great documentation here but we outline some of the learnings on implementing metrics for our use here.
Internal tracking events
Internal Tracking Events capture discrete events and can be collected over 7 days, 28 days, or all time. These events require code changes to explicitly fire the event.
Here is an example from Notes::BaseService
...
include Gitlab::InternalEventsTracking
...
track_internal_event('create_commit_note', project: project, user: current_user)
...
Each event has a descriptive name and, when possible, useful data for context like:
- user
- project
- namespace (which will be pulled from
project
if not supplied)
There’s another field, category
that is automatically set to the classname of
the class that the event was fired from. This is important to know for testing.
Each event can also have up to 3 additional data. Events support 2 string values, and 1 numeric.
These additional properties are stored in the additional_properties
map in the
event and the keys are:
label
(string)property
(string)value
(numeric)
You should utilize them in that order when possible, i.e., label
before
property
.
Process for adding
You should follow the process Analytics Instrumentation has defined in the quick start guide. linked docs.
TL;DR Run the ruby scripts/internal_events/cli.rb
CLI tool and follow the
prompts. The event definition is necessary for the code to know what
events to output and the metric definition is what Snowflake and Tableau will
make available given the events.
Testing
There are shared examples that you can utilize to test firing of tracking events:
it_behaves_like 'internal event tracking' do
let(:event) { "detect_secret_type_on_push" }
let(:namespace) { project.namespace }
let(:label) { "GitLab Personal Access Token" }
let(:category) { described_class.name }
...
subject
end
Unlike in the implementation file, when using the shared example in the specs,
you will need to define the :category
to be the class under test.
Shared Examples
If you’re adding internal event tracking tests to shared specs, you need to be
able to redefine the subject
to be what triggers the event firing if it isn’t
already.
As an example, in the specs for Gitlab::Checks::SecretsCheck we use the shared examples from secrets_check_shared_example.rb.
In that file, most of the specs call subject.validate!
to run the secrets
check, but for the internal tracking shared examples, it expects to be able to
just call subject
.
Therefore, to use the internal tracking event shared
examples from our shared examples we have to redefine subject
to subject { super().validate! }
. super()
within the subject{}
block refers to the
predefined subject
object, i.e., the Gitlab::Checks::SecretsCheck
class.
So in this special case, we add a shared example internal event tracking
to
be:
it_behaves_like 'internal event tracking' do
let(:event) { 'skip_secret_push_protection' }
let(:namespace) { project.namespace }
let(:label) { "commit message" }
let(:category) { described_class.name }
subject { super().validate! }
end
Database Metric (Service Ping)
Database metrics, aka Service Pings, are metrics that can be collected with database queries. These metrics are updated in a batch approximately every 7 days. However, this is not guaranteed and may be generated anywhere from 4-10 days.
Process for adding
Database metrics are implemented by a Ruby subclass of
GitLab::Usage::Metrics::Instrumentation::DatabaseMetric
and
utilizes ActiveRecord
relations to build the queries. Alternatively, you can
provide the SQL for the query too.
The class should be in lib/gitlab/usage/metrics/instrumentation/
or the EE
equivalent.
We have a Rails generator that can be used to create the necessary classes:
rails generate gitlab:usage_metric CountIssues --type database --operation distinct_count
create lib/gitlab/usage/metrics/instrumentations/count_issues_metric.rb
create spec/lib/gitlab/usage/metrics/instrumentations/count_issues_metric_spec.rb
The simplest way to implement the metric is to call the class-level #operation
and #relation
methods.
The argument to operation
can be
:count
:distinct_count
:estimate_batch_distinct_count
:sum
:average
relation
takes a block that returns the query results.
Example from `Gitlab::usage::Metrics::Instrumentation::CountProjectsWithSecretPushProtectionEnabledMetric:
class CountProjectsWithSecretPushProtectionEnabledMetric < DatabaseMetric
operation :count
relation do
ProjectSecuritySetting.where(pre_receive_secret_detection_enabled: true)
end
end
Each database metric has to have an accompanying metric dictionary like Internal
Tracking Events. Unfortunately, database metrics are not yet supported by the internal_events
CLI script so must be partially done by hand.
- Create a yaml file in the appropriate subdirectory of
config/metrics
oree/config/metrics
if it’s a metric limited to an enterprise tier.- If the metric is meant to capture all time, use the
counts_all
subdirectory. - Otherwise use the appropriate
counts_7d
orcounts_28d
subdirectory for weekly and monthly metrics respectively.
- If the metric is meant to capture all time, use the
- Use existing yaml files as templates
- Use the schema defined here.
NOTE: Make sure that the milestone is a string.
Testing
Like Internal Tracking Events, database metrics have shared examples that we can utilize in our tests.
it_behaves_like 'a correct instrumented metric value', { time_frame: 'all',
data_source: 'database' }
time_frame
should match the value in the dictionary for the metric that was
defined, i.e., 7d
, 28d
, or all
.
Viewing and analyzing
Verifying creation and deployment
Metrics are collected into Snowflake which are then viewable in Tableau. To verify a metric is in production and being generated there are 2 locations to check:
- Metrics dictionary
- Tableau Service Ping Exploration (You need
Explorer
level access or higher to Tableau)
The Metrics dictionary
only shows what metrics are available and gives you the
ability to copy the Snowflake query to get its values.
The Tableau Service Ping explorer shows the basic values of and allows you see the last 5 generated values. Further analysis must be done either in Tableau or Snowflake.
With the Explorer
role in Tableau, you will be able to create dashboards but
will be limited to using data sources created by someone with higher
permissions. Any new Internal Tracking Event or Database Metric should be
included in existing data sources
Mart Ping Instance Metric Monthly
Mart Ping Instance Metric Weekly
Asking for help
If you’re not familiar with Tableau, creating worksheets and dashboards in it, and haven’t worked through the Tableau-hosted courses, you have some options for help:
- For help in creating Tableau dashboards and visualizations, the Product Data Insights (PDI) team has an issue intake process where you can request their help.
- For specific questions on your Tableau worksheet or
dashboard, you can reach out to the PDI team on their slack channels:
- #data-tableau for Tableau-specific help
- #data for any data-related question
Troubleshooting
If the metrics don’t show up in Tableau or Snowplow you should contact
#g_monitor_analytics_instrumention
or #data_tableau
slack channel.
If, in Tableau, you can’t find either of the 2 data sources mentioned above,
make sure to use the New Data Source
button, then click the See All
link on
the right side above the table of available data sources.
d748cf8c
)