How to monitor and respond to issues with SAST Automatic Vulnerability Resolution?
When to use this runbook?
This runbook is intended to be used when there is a service degaradation in relation to the SAST Automatic Vulnerability Resolution feature. Such degradation can be identified by monitoring the following:
- Sidekiq Error Rate (in the Static Analysis group dashboard) with
Vulnerabilities::MarkDroppedAsResolvedWorkerselected. - Sidekiq execution Apdex and Error Ratio panels from the Static Analysis error budget.
SAST Automatic Vulnerability Resolution
The SAST Automatic Vulnerability Resolution feature is built to, as the name implies, automatically resolve vulnerabilities tied to SAST rules that have been disabled or removed.
The feature depends on a number of building blocks:
- Schema definition in security-report-schemas.
- SARIF module in
analyzers/reportpackage. - Processing dropped identifier within Rails application.
Schema definition in security-report-schemas
Reports generated by security analyzer scans have their JSON schemas defined in security-report-schemas repository. Automatic vulnerability resolution depends on a certain schema field (i.e. primary_identifiers) which is part of security-report-format, the latter being the parent schema for all other security reports’ schema, including sast-report-format.
SARIF module in analyzers/report package
The primary_identifiers field contains an exhaustive list of all identifiers for which the analyzer scan (as opposed to identifiers detected), so a report may have zero vulnerabilities but scan.primary_identifiers contain a full list. The list is generated while transforming a SARIF file into a SAST security report in the sarif.go module under analyzers/report package.
Dropped identifier processing within Rails application
While ingesting a security report within the gitlab-org/gitlab application, the IngestReportsService iterates through scan primary identifiers and executes ScheduleMarkDroppedAsResolvedService for each scan type, which in turn schedules MarkDroppedAsResolvedWorker. The worker loops through all vulnerabilities with identifiers matching the disabled or dropped identifiers (i.e no longer present in latest scan).
Below is a diagram showing the complete flow of automatic vulnerability resolution feature.
flowchart TB
code --> analyzer_pipeline
subgraph analyzer_pipeline["analyzer pipeline"]
direction LR
analyzer["semgrep analyzer"] --> report_a["noisy-rule-123 dropped"]
report_a --> report_b["scan.identifiers populated"]
report_b --> report_c("gl-sast-report.json")
end
analyzer_pipeline --> rails_application
subgraph rails_application["rails application"]
ingest["IngestReportsService"] --> schedule["ScheduleMarkDroppedAsResolved"]
schedule --> worker["MarkDroppedAsResolvedWorker"]
end
Monitoring
To monitor automatic vulnerability resolution, there are two primary sources of information: sentry.io which lists any errors occurring in MarkDroppedAsResolvedWorker class for the last 24 hours, and SAST Engineering dashboard on Kibana, which includes a number of panels monitoring certain works and showing the volume of uploaded reports. Please see below for a list of panels of interest and a brief description of each.
SAST Report Uploads
Displays the 90th percentile of file size of security reports uploaded, per 30 minutes. This is useful to see how big (or small) security reports that have been uploaded over a certain amount of time.
SAST Failing Workers Distribution
Shows the distribution of SAST-related sidekiq workers failing over a period of time.
Vulnerabilities::MarkDroppedAsResolvedWorker Execution Time
Displays the 75th and 95th percentiles of the worker’s execution time.
Vulnerabilities::MarkDroppedAsResolvedWorker Job Status
Shows the count of job executions, split by job status, per hour. This is useful to gauge the amount of failing, deduplicated, or successful executions over a certain amount of time.
Top Projects for MarkDroppedAsResolvedWorker Executions
Shows the top projects listed by the count of their worker executions. This can be useful to see if a certain customer is experiencing an issue.
Logs
Additionally, you may want to check the following two saved searches in production logs:
- Vulnerabilities::MarkDroppedAsResolvedWorker – Total Executions.
- Vulnerabilities::MarkDroppedAsResolvedWorker – Executions with DB Writes.
What to do if something goes wrong?
- Start by looking at the monitoring section above. Check if
MarkDroppedAsResolvedWorkerhas any failures. - Look at the logs, and see if the issue is possibly due to a query timing out while executing a database write operation (e.g. trying to resolve a huge number of findings).
- Consider turning automatic vulnerability resolution off.
Possible Checks
- If there’s an increase in error rates in relation to automatic vulnerability resolution, there’s a possiblity it could be related to this timeout issue when a very high number of vulnerability findings are being resolved.
How to turn automatic vulnerability resolution off?
The presence of primary_identifiers is required for report ingestion and automatic vulnerability resolution. If automatic vulnerability resolution is not working as expected, consider stopping automatic resolution by ensuring scans do not have primary_identifiers included in the generated reports. To do so, consider one of the following options:
- Update
sarif.gomodule to revert the change introduced in this merge request. - Update
ScheduleMarkDroppedAsResolvedService#dropped_identifiersmethod to return early regardless of the existence ofprimary_identifiers.
46417d02)
