EPSS Support

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
ongoing YashaRise theoretick johncrowley tkopel nilieskou devops secure 2024-06-19

For important terms, see glossary.

Summary

EPSS scores specify the likelihood a CVE will be exploited in the next 30 days. This data may be used to improve and simplify prioritization efforts when remediating vulnerabilities in a project. EPSS support requirements are outlined in the EPSS epic along with an overview of EPSS. This document focuses on the technical implementation of EPSS support.

EPSS scores may be populated from the EPSS Data page or through their provided API. Ultimately, EPSS scores should be reachable through the GitLab GraphQL API, as seen on the vulnerability report and details pages, and be filterable and usable when setting policies.

Package metadata database (PMDB, also known as license-db), an existing advisory pull-and-enrichment method, is for this purpose. The flow is as follows:

flowchart LR
    A[EPSS Source] -->|Pull| B[PMDB]
    B -->|Process and export| C[Bucket]
    C -->|Pull| D[GitLab Instance]

Motivation

The classic approach to vulnerability prioritization is using severity based on CVSS. This approach provides some guidance, but is too unrefined—more than half of all published CVEs have a high or critical score. Other metrics need to be employed to reduce remediation fatigue and help developers prioritize their work better. EPSS provides a metric to identify which vulnerabilities are most likely to be exploited in the near future. Combined with existing prioritization methods, EPSS helps to focus remediation efforts better and reduce remediation workload. By adding EPSS to the information presented to users, we deliver these benefits to the GitLab platform.

Goals

  • Enable users to use EPSS scores on GitLab as another metric for their vulnerability prioritization efforts.
  • Provide scalable means of efficiently repopulating recurring EPSS scores to minimize system load.

Phase 1 (MVC)

  • Enable access to EPSS scores through GraphQL API.

Phase 2

  • Show EPSS scores in vulnerability report and details pages.

Phase 3

  • Allow filtering vulnerabilities based on EPSS scores.
  • Allow creating policies based on EPSS scores.

Non-Goals

  • Dictate priority to users based on EPSS (or any other metric).

Proposal

Support EPSS on the GitLab platform.

Following the discussions in the EPSS epic, the proposed flow is:

  1. PMDB database is extended with a new table to store EPSS scores.
  2. PMDB infrastructure runs the feeder daily in order to pull and process EPSS data.
  3. The advisory-processor receives the EPSS data and stores them to the PMDB DB.
  4. PMDB exports EPSS data to a new PMDB EPSS bucket.
    • Create a new bucket to store EPSS data.
    • Delete former EPSS data once new data is uploaded, as the old data is no longer needed.
    • Truncate EPSS scores to two digits after the dot.
  5. GitLab instances pull data from the PMDB EPSS bucket.
    • Create a new table in rails DB to store EPSS data.
  6. GitLab instances expose EPSS data through GraphQL API and present data in vulnerability report and details pages.
flowchart LR
    AF[Feeder] -->|pulls| A[EPSS Source]
    AF -->|publishes| AP[Advisory Processor]
    AP -->|stores| DD[PMDB database]
    E[Exporter] -->|loads|DD
    E --> |exports| B[Public Bucket]
    GitLab[GitLab instance] --> |syncs| B
    GitLab --> |stores| GitLabDB

Design and implementation details

Decisions

Important notes

  • All EPSS scores get updated on a daily basis. This is pivotal to this feature’s design.
  • The fields retrieved from the EPSS source are cve, score, percentile. 9 digits after the dot are maintained.

PMDB

  • Create a new EPSS table in PMDB with an advisory identifier and the EPSS score. This includes changing the schema and any necessary migrations.
  • Ingest EPSS data into new PMDB table. We want to keep the EPSS data structure as close as possible to the origin so all of the data may be available to the exporter, and the exporter may choose how to process it. Therefore we will save scores and percentiles with their complete values.
  • Export EPSS scores in separate bucket.
    • Delete the previous day’s export as it is no longer needed after the new one is added.
  • Add new pubsub topics to deployment to be used by PMDB components, using existing terraform modules.

GitLab Rails backend

  • Create table in rails backend to hold EPSS scores.
  • Configure Rails sync to ingest EPSS exports and save to new table.
  • Include EPSS data attributes in GraphQL API Occurrence objects.

GitLab UI

  • Add EPSS data to vulnerability report page.
  • Add EPSS data to vulnerability details page.
  • Allow filtering by EPSS score.
  • Allow creating policies based on EPSS score.

Alternative Solutions

Glossary

  • PMDB (Package metadata database, also known as License DB): PMDB is a standalone service (and not solely a database), outside of the Rails application, that gathers, stores and exports packages metadata for GitLab instances to consume. See complete documentation. PMDB components include:
    • Feeder: a scheduled job called by the PMDB deployment to publish data from the relevant sources to pub/sub messages consumed by PMDB processors.
    • Advisory processor: Runs as a Cloud Run instance and consumes messages published by the advisory feeder containing advisory related data and stores them to the PMDB database.
    • PMDB database: a PostgreSQL instance storing license and advisory data.
    • Exporter: exports license/advisory data from the PMDB database to public GCP buckets.
  • GitLab database: the database used by GitLab instances.
  • CVE (Common Vulnerabilities and Exposures): a list of publicly known information-security vulnerabilities. “A CVE” usually refers to a specific vulnerability and its CVE ID.
  • EPSS (Exploit prediction scoring system) score: a score ranging from 0 to 1 representing the probability of exploitation in the wild in the next 30 days of a given vulnerability.
  • EPSS score percentile: for a given EPSS score (of some vulnerability), the proportion of all scored vulnerabilities with the same or a lower EPSS score.

EPSS Support ADR 001: Export all EPSS entries

Context

PMDB advisories are exported using deltas. New advisories and advisories which have changed since the last export are exported each time the exporter runs. This can be observed in the PMDB advisory bucket, where exports are organized in directories titled with a timestamp. Each directory contains the items changed since the preceding directory’s timestamp. This is done to avoid very large updates of data.

EPSS scores specify the probability that a vulnerability will be exploited in the next 30 days. Therefore, all EPSS scores are updated every day. This renders the delta approach unhelpful—if almost all values change every day, then the deltas will always be big and will save very little resources.

EPSS Support ADR 002: Use a new bucket for EPSS data

Context

PMDB exports data to GCP buckets. The data is later pulled by GitLab instances. Advisory data and license data are stored in different buckets. This is sensible, because advisory and license data are not directly related, and rather provide additional information about packages. Data is updated based on deltas—changes from the previous state of the data. Only those changes are saved with each addition to the database.

EPSS Support ADR 003: Use EPSS API over ZIP file

Context

The EPSS Feeder in PMDB retrieves data from the EPSS source and publishes it via GCP’s Pub/Sub. Two options were considered for this retrieval:

  1. Downloading a ZIP file containing EPSS data.
  2. Using the API to fetch data directly.

We chose the API because the EPSS feeder publishes data in batches through GCP, making it more suitable for adding offsets and limits to each request. Using a ZIP file would require extracting and saving all data locally, which is less efficient and elegant. Additionally, API requests allow for controlled data sizes without handling local files. More details can be found in this issue.

Last modified August 23, 2024: Ensure frontmatter is consistent (e47101dc)