Composition Analysis Analyzers Metrics
| Status | Authors | Coach | DRIs | Owning Stage | Created |
|---|---|---|---|---|---|
| implemented |
nilieskou
|
mbenayoun
|
mbenayoun
zmartins
|
devops secure | 2025-11-01 |
Summary
This document describes the architectural design for implementing comprehensive end-to-end metrics collection for GitLab’s Composition Analysis (CA) feature area. The goal is to gather better data on how customers use CA tools to inform product decisions and measure feature adoption across different deployment types (GitLab.com, self-managed, and dedicated instances).
Epic: Create end-to-end metrics for CA (#18116)
Motivation
Currently, GitLab lacks comprehensive metrics on Composition Analysis usage. Legacy metrics may be incomplete or inaccurate. The team needs better observability into:
- Breadth of adoption: How many people, organizations, and projects use CA tools
- Feature usage: Which specific features are being used (Dependency Scanning, Container Scanning, Static Reachability, etc.)
- Intensity of usage: How frequently are scans run, how many vulnerabilities are found, and what actions are taken
- Quality metrics: Scan success rates, error patterns, and performance characteristics
- Configuration patterns: How users configure and customize CA analyzers
This data is essential for:
- Measuring migration progress (e.g., from Gemnasium to the new DS analyzer)
- Prioritizing feature development based on actual usage patterns
- Understanding performance characteristics across different project types
- Tracking feature adoption (e.g., Static Reachability enablement)
Goals
- Comprehensive data collection: Capture metrics across all CA analyzers (Dependency Scanning, Container Scanning, OCS, CVS)
- Multi-deployment support: Collect metrics from GitLab.com, self-managed, and dedicated instances
- Granular insights: Provide data at project, namespace, and instance levels
- Performance tracking: Monitor scan duration, resource usage, and success rates
- Feature adoption: Track which features and configurations are being used
- Iterative approach: Build incrementally, starting with Dependency Scanning
Proposal
We maintain an event-registry project that defines all analyzer events. Each analyzer collects events and includes them in the observability section of the Vulnerability Report. When the dependency-scanning job completes, GitLab ingests the generated report and stores the events in Snowflake. A Tableau dashboard provides visualization by querying the Snowflake data.
Note that most data comes from gitlab.com. Self-managed instances only contribute data if users opt in to sending telemetry to GitLab.
flowchart TD
subgraph Analyzers["CI-based Analyzers"]
G["Gemnasium"]
GM["Gemnasium-Maven"]
GP["Gemnasium-Python"]
DS["DS Analyzer"]
end
subgraph EventData["Event Data Collected"]
E["Scan Metrics<br/>Performance Metrics<br/>SBOM Metrics<br/>Static Reachability Metrics"]
end
subgraph SecurityReport["Security Report"]
SR["gl-dependency-scanning-report.json<br/>with observability section"]
end
subgraph GitLab["GitLab"]
ING["Ingest Report"]
IET["track_internal_event()"]
end
subgraph Storage["Data Warehouse"]
SF["Snowflake"]
end
subgraph Visualization["Visualization"]
TB["Tableau Dashboard"]
end
G & GM & GP & DS --> E
E --> SR
SR --> ING
ING --> IET
IET --> SF
SF --> TB
Key Components
1. Observability Data in Security Reports
Analyzers embed observability data directly in the security report JSON under the observability section. This includes:
- Scan metadata: Duration, analyzer version, number of vulnerabilities found
- SBOM data: Component counts, PURL types, input file paths
- Static Reachability metrics: Coverage, execution time, in_use components
- Performance metrics: Execution time for different phases
Below you can see an example of an SBOM event:
{
"event": "collect_ds_analyzer_scan_sbom_metrics_from_pipeline",
"property": "scan_uuid",
"label": "npm",
"value": 150,
"input_file_path": "yarn.lock"
}
2. Event Ingestion Pipeline
When a pipeline completes, GitLab processes scan events through:
-
ProcessScanEventsService (
ee/app/services/security/process_scan_events_service.rb)- Extracts observability data from security reports
- Validates events against
EVENT_NAME_ALLOW_LIST - Tracks internal events using
track_internal_event()
-
Event Tracking
- Events are sent to Snowplow (GitLab.com)
- Each event contains: event name, property, label, value, and extra (JSON) fields
- Important filtering/joining data goes in property/label/value columns
- Additional data goes in extra field (less performant for queries)
3. Event Registry
A centralized Go package manages all CA analyzer events:
- Project: gitlab-org/security-products/analyzers/events
- Purpose: Single source of truth for event definitions
- Reusability: Shared between Gemnasium and DS analyzer
- Maintainability: Centralized event definitions reduce duplication
Design Decisions
1. Observability Data in Security Reports vs. Separate Artifacts
Decision: Embed observability data directly in the security report rather than creating separate artifacts
Pros:
- Keeps related data together in one place
- Builds on existing security report infrastructure and the ingestion logic that defines how events are defined, processed and stored in Snowflake.
- Single artifact to manage and store
Cons:
- Slightly increases security report size
- Binds metrics to the report
- Requires additional work to collect data when the analyzer fails
2. Multiple Events vs. Single Monolithic Event
Decision: Create separate events for different data types (scan, SBOM, SR, features) rather than a single event containing all data
Pros:
- Reduces fields per event, improving query efficiency in Snowflake. We try to avoid using custom event fields.
- Allows independent scaling of different event types
- Easier to add new event types without modifying existing ones
- Better separation of concerns
Cons:
- Requires joining events using scan_uuid
- More events generated per scan
3. Event Data Storage: Fast Columns vs. JSON Extra Field
Decision: Store important filtering/joining data in property/label/value columns; Avoid as much as possible storing additional data in the extra JSON field
Fast Columns Every event is based on a base event class that contains three fields:
property: CA uses this field always for the scan_uuid (for joining related events)label: CA uses it for data like analyzer version, PURL typevalue: component count, execution time, vulnerability count
Extra Field: Events can include additional fields, which are stored in a jsonb column. Querying these extra fields is more resource-intensive than querying standard columns.
Pros:
- Fast filtering and joining on important dimensions
- Flexible for additional metrics without schema changes
Cons:
- Requires careful planning of what goes where
4. Centralized Event Registry vs. Distributed Event Definitions
Decision: Create a centralized Go package for all CA analyzer events
Package: gitlab-org/security-products/analyzers/events
Pros:
- Single source of truth for event definitions
- Easier to discover available events
- Reduces duplication between analyzers
- Could facilitates reuse across AST groups
- Simplifies event versioning and changes
Cons:
- Adds complexity: changes require updates in 2 projects (event registry + analyzer)
- Adds a dependency between projects
5. Gemnasium Flavor Differentiation
Decision: Distinguish between the 3 Gemnasium flavors directly in the event names. This eliminates the need for a separate field to indicate which Gemnasium flavor the data refers to—the information is encoded in the event name itself.
Flavors:
gemnasiumgemnasium-mavengemnasium-python
Implementation: Store flavor information in event name
Pros:
- Optimize event storage
Cons:
- Requires tracking multiple analyzer variants
Future work
Future work includes:
-
Failure event tracking: We need to modify analyzers to generate vulnerability reports even when scans fail. This will enable us to collect failure metrics and, more importantly, build a system for capturing warnings and errors that can be surfaced to users. This capability will be particularly valuable for organization administrators monitoring scan health across projects.
-
Extending event coverage: Implement similar event tracking for Container Scanning and Operational Container Scanning.
3dea4f87)
