How to monitor and respond to issues with SAST Automatic Vulnerability Resolution?
When to use this runbook?
This runbook is intended to be used when there is a service degaradation in relation to the SAST Automatic Vulnerability Resolution feature. Such degradation can be identified by monitoring the following:
- Sidekiq Error Rate (in the Static Analysis group dashboard) with
Vulnerabilities::MarkDroppedAsResolvedWorker
selected. - Sidekiq execution Apdex and Error Ratio panels from the Static Analysis error budget.
SAST Automatic Vulnerability Resolution
The SAST Automatic Vulnerability Resolution feature is built to, as the name implies, automatically resolve vulnerabilities tied to SAST rules that have been disabled or removed.
The feature depends on a number of building blocks:
- Schema definition in security-report-schemas.
- SARIF module in
analyzers/report
package. - Processing dropped identifier within Rails application.
Schema definition in security-report-schemas
Reports generated by security analyzer scans have their JSON schemas defined in security-report-schemas repository. Automatic vulnerability resolution depends on a certain schema field (i.e. primary_identifiers
) which is part of security-report-format
, the latter being the parent schema for all other security reports’ schema, including sast-report-format
.
SARIF module in analyzers/report
package
The primary_identifiers
field contains an exhaustive list of all identifiers for which the analyzer scan (as opposed to identifiers detected), so a report may have zero vulnerabilities but scan.primary_identifiers
contain a full list. The list is generated while transforming a SARIF file into a SAST security report in the sarif.go
module under analyzers/report
package.
Dropped identifier processing within Rails application
While ingesting a security report within the gitlab-org/gitlab
application, the IngestReportsService
iterates through scan primary identifiers and executes ScheduleMarkDroppedAsResolvedService
for each scan type, which in turn schedules MarkDroppedAsResolvedWorker
. The worker loops through all vulnerabilities with identifiers matching the disabled or dropped identifiers (i.e no longer present in latest scan).
Below is a diagram showing the complete flow of automatic vulnerability resolution feature.
flowchart TB code --> analyzer_pipeline subgraph analyzer_pipeline["analyzer pipeline"] direction LR analyzer["semgrep analyzer"] --> report_a["noisy-rule-123 dropped"] report_a --> report_b["scan.identifiers populated"] report_b --> report_c("gl-sast-report.json") end analyzer_pipeline --> rails_application subgraph rails_application["rails application"] ingest["IngestReportsService"] --> schedule["ScheduleMarkDroppedAsResolved"] schedule --> worker["MarkDroppedAsResolvedWorker"] end
Monitoring
To monitor automatic vulnerability resolution, there are two primary sources of information: sentry.io which lists any errors occurring in MarkDroppedAsResolvedWorker
class for the last 24 hours, and SAST Engineering dashboard on Kibana, which includes a number of panels monitoring certain works and showing the volume of uploaded reports. Please see below for a list of panels of interest and a brief description of each.
SAST Report Uploads
Displays the 90th percentile of file size of security reports uploaded, per 30 minutes. This is useful to see how big (or small) security reports that have been uploaded over a certain amount of time.
SAST Failing Workers Distribution
Shows the distribution of SAST-related sidekiq
workers failing over a period of time.
Vulnerabilities::MarkDroppedAsResolvedWorker Execution Time
Displays the 75th and 95th percentiles of the worker’s execution time.
Vulnerabilities::MarkDroppedAsResolvedWorker Job Status
Shows the count of job executions, split by job status, per hour. This is useful to gauge the amount of failing, deduplicated, or successful executions over a certain amount of time.
Top Projects for MarkDroppedAsResolvedWorker Executions
Shows the top projects listed by the count of their worker executions. This can be useful to see if a certain customer is experiencing an issue.
Logs
Additionally, you may want to check the following two saved searches in production logs:
- Vulnerabilities::MarkDroppedAsResolvedWorker – Total Executions.
- Vulnerabilities::MarkDroppedAsResolvedWorker – Executions with DB Writes.
What to do if something goes wrong?
- Start by looking at the monitoring section above. Check if
MarkDroppedAsResolvedWorker
has any failures. - Look at the logs, and see if the issue is possibly due to a query timing out while executing a database write operation (e.g. trying to resolve a huge number of findings).
- Consider turning automatic vulnerability resolution off.
Possible Checks
- If there’s an increase in error rates in relation to automatic vulnerability resolution, there’s a possiblity it could be related to this timeout issue when a very high number of vulnerability findings are being resolved.
How to turn automatic vulnerability resolution off?
The presence of primary_identifiers
is required for report ingestion and automatic vulnerability resolution. If automatic vulnerability resolution is not working as expected, consider stopping automatic resolution by ensuring scans do not have primary_identifiers
included in the generated reports. To do so, consider one of the following options:
- Update
sarif.go
module to revert the change introduced in this merge request. - Update
ScheduleMarkDroppedAsResolvedService#dropped_identifiers
method to return early regardless of the existence ofprimary_identifiers
.
46417d02
)