Static Reachability

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
implemented nilieskou mbenayoun joelpatterson nilieskou devops secure 2025-10-01

Summary

Static Reachability is a feature within GitLab’s Dependency Scanning that determines whether a dependency is actively used (imported) by the application code. This contextual information enriches vulnerability findings, enabling users to prioritize remediation efforts by focusing on vulnerabilities in dependencies that are actually reachable from their codebase, rather than treating all vulnerabilities equally regardless of their actual impact.

The feature supports Java, JavaScript/TypeScript, and Python projects, integrating seamlessly into the existing Dependency Scanning pipeline workflow. Reachability data is surfaced in the Vulnerability Report and Vulnerability Details pages, with filtering capabilities to help security teams triage findings more effectively.

Motivation

Security teams face a critical challenge in vulnerability management: they receive overwhelming volumes of vulnerability findings from dependency scanning, but lack the context needed to prioritize remediation efforts effectively. This creates decision paralysis and inefficient resource allocation. Currently, teams rely on three generic data points to assess risk:

CVSS Severity: A standardized but context-agnostic score EPSS Score: Exploit prediction based on threat intelligence KEV (Known Exploited Vulnerabilities): Catalog of actively exploited CVEs

While valuable, these metrics tell only part of the story. A critical vulnerability in an unused library poses significantly less risk than a medium-severity vulnerability in a heavily-used dependency. Without this contextual insight, teams waste time investigating and remediating vulnerabilities that have minimal actual impact on their applications.

By introducing reachability data we enable teams to filter noise, focus effort, reduce time-to-remediation and improve security posture. Reachability analysis is increasingly expected in modern security tools. By implementing this feature, GitLab strengthens its competitive position in the Application Security Testing market and delivers tangible value to customers managing complex dependency ecosystems.

Goals

  1. Provide reachability context: Determine whether dependencies are actively imported in the application code
  2. Reduce noise: Allow users to filter and deprioritize vulnerabilities in unused dependencies
  3. Improve remediation efficiency: Enable faster, more informed triage decisions
  4. Support major ecosystems: Cover Java (Maven), Python (PyPI), and JavaScript/TypeScript (npm, pnpm, yarn)
  5. Enable offline environments: Support air-gapped and self-managed instances

Non-Goals

  1. Dynamic/Runtime Reachability: This design focuses on static analysis only; runtime call graph analysis is out of scope
  2. Cross-file deep analysis: To maintain performance, the analysis does not perform exhaustive cross-file data flow tracking
  3. Automatic severity adjustment: Reachability status is informational only
  4. Function-level reachability: The current scope is dependency-level (is the library used?), not function-level (is the vulnerable function called?)

Proposal

Static Reachability requires package metadata to function correctly across different programming languages. For instance, in Python, the package name listed in an SBOM may differ from the name used in import statements. In Java, imports reference specific paths within a library rather than the library name itself. To perform Static Reachability analysis, we need mappings that connect package names to their corresponding import identifiers—whether that’s the actual import name in Python or the internal import paths in Java. JavaScript and TypeScript do not require this metadata because package names in npm registries match the import names used in code, eliminating the need for translation mappings.

The SR-metadata repository maintains up-to-date PyPI and Maven package metadata, refreshed weekly through language-specific scrapers. This metadata is bundled into the DS analyzer’s Docker image to support offline usage.

Static Reachability runs as part of Dependency Scanning within CI/CD pipelines and is available in the new DS analyzer. The analyzer first generates SBOMs, then attempts Static Reachability analysis for each supported language when the feature is enabled. Since SR can extend execution time, it’s disabled by default. The analysis uses semgrep-core to identify import statements and cross-references them with SR-metadata to determine which SBOM components are actively used. Reachability information is embedded in the SBOM reports and ingested when the Dependency Scanning job completes. Vulnerabilities include a reachability field that appears in both the vulnerability report and detail pages, allowing users to filter by reachability state.

Design and implementation details

High Level Design

Below you can see a diagram on how Static Reachability Analysis works and integrates with the GitLab Platform.

graph TB
    subgraph Metadata["Metadata Scraping Pipeline (Weekly)"]
        Trigger["⏰ Scheduled Pipeline<br/>(Once per week)"]
        PyPI["PyPI Scraper<br/>(Extract module names)"]
        Maven["Maven Scraper<br/>(Extract package info)"]
        
        Trigger -->|triggers| PyPI
        Trigger -->|triggers| Maven
    end
    
    subgraph MetadataRepo["SR-Metadata Repository<br/>(Git)"]
        PyPIData["PyPI Module Data<br/>(library → module mapping)"]
        MavenData["Maven Package Data<br/>(package metadata)"]
    end
    
    PyPI -->|writes| PyPIData
    Maven -->|writes| MavenData
    
    subgraph DockerImage["<b>DS Analyzer Docker Image</b><br/>"]
        BundledMeta["Bundled SR-Metadata<br/>(PyPI + Maven data)"]
    end
    
    PyPIData -->|bundled into| BundledMeta
    MavenData -->|bundled into| BundledMeta
    
    subgraph DSAnalyzer["Dependency Scanning Analyzer"]
        SBOM["1️⃣ Generate SBOM<br/>(Parse lockfiles)"]
        
        subgraph SRCheck["2️⃣ Check SR Eligibility"]
            LangCheck["Supported Language?<br/>(Java/Python/JS)"]
            SREnabled["SR Enabled?"]
        end
        
        subgraph SRAnalysis["3️⃣ Static Reachability Analysis"]
            Semgrep["semgrep-core<br/>(Find imports)"]
            MetadataUse["Use Bundled SR-Metadata<br/>(library mappings)"]
        end
        
        Enrich["4️⃣ Enrich SBOM<br/>(Add reachability properties)"]
        Output["SBOM Output<br/>(gl-sbom-*.cdx.json)"]
        
        SBOM --> LangCheck
        LangCheck -->|Yes| SREnabled
        LangCheck -->|No| Output
        SREnabled -->|Yes| Semgrep
        SREnabled -->|No| Output
        
        Semgrep --> MetadataUse
        MetadataUse -->|reads| BundledMeta
        Semgrep --> Enrich
        Enrich --> Output
    end
    
    subgraph GitLabRails["GitLab Rails Platform"]
        Ingest["SBOM Ingestion Service"]
        VulnDB["Vulnerability Database<br/>(with reachability_status)"]
        GraphQL["GraphQL API<br/>(reachability field)"]
    end
    
    Output -->|artifact| Ingest
    Ingest --> VulnDB
    VulnDB --> GraphQL
    
    subgraph UI["Frontend UI"]
        VulnReport["Vulnerability Report<br/>(Reachability column)"]
        VulnDetails["Vulnerability Details<br/>(Reachability indicator)"]
    end
    
    GraphQL --> VulnReport
    GraphQL --> VulnDetails
    
    style Trigger fill:#4a90e2
    style PyPI fill:#50c878
    style Maven fill:#50c878
    style PyPIData fill:#fff3cd
    style MavenData fill:#fff3cd
    style BundledMeta fill:#fff3cd
    style SBOM fill:#e8f4f8
    style Semgrep fill:#ff6b6b
    style Enrich fill:#9b59b6
    style Output fill:#27ae60
    style Ingest fill:#3498db
    style VulnDB fill:#2c3e50
    style GraphQL fill:#16a085
    style VulnReport fill:#f39c12
    style VulnDetails fill:#f39c12

Reachability Status Values

Status Meaning
Reachable: Yes Analysis confirmed the dependency is imported
Reachable: Not Found Analysis could not find evidence that the dependency is imported
Reachable: Not Available Reachability analysis data are missing

The design intentionally avoids a “Not Reachable” status to prevent false negatives. If analysis cannot confirm usage, it defaults to “Not Found” rather than incorrectly marking a dependency as unused.

Metadata Scraper Infrastructure

For Python and Java, external metadata is required to map library names to importable module names:

graph TB
    subgraph PyPIScraping["Python Metadata Scraping "]
        PyPIJob["GitLab Pipeline Job"]
        PyPIStep1["1. Retrieve package names<br/>(from PMDB)"]
        PyPIStep2["2. Download latest<br/>distribution from pypi.org"]
        PyPIStep3["3. Extract library →<br/>module name mapping"]
        
        PyPIJob --> PyPIStep1
        PyPIStep1 --> PyPIStep2
        PyPIStep2 --> PyPIStep3
    end
    
    subgraph JavaScraping["Java Metadata Scraping "]
        JavaJob["GitLab Pipeline Job"]
        JavaStep1["1. Identify target packages:<br/>- Vulnerable packages (GLAD)<br/>- Popular packages (BigQuery)"]
        JavaStep2["2. Download from<br/>Maven Central"]
        JavaStep3["3. Extract library →<br/>import path mapping"]
        JavaStep4["4. Optimize with<br/>Radix tree structure"]
        
        JavaJob --> JavaStep1
        JavaStep1 --> JavaStep2
        JavaStep2 --> JavaStep3
        JavaStep3 --> JavaStep4
    end
    
    subgraph Storage["SR-Metadata Repository (Git)"]
        PyPIData["Python Metadata<br/>(library → module)"]
        JavaData["Java Metadata<br/>(library → import path)<br/>(Radix tree optimized)"]
        Versioned["Versioned & Compressed"]
    end
    
    PyPIStep3 -->|commit| PyPIData
    JavaStep4 -->|commit| JavaData
    PyPIData --> Versioned
    JavaData --> Versioned
    
    subgraph DockerBuild["DS Analyzer Docker Image Build"]
        Bundle["Bundle SR-Metadata<br/>into Docker image"]
    end
    
    Versioned -->|include| Bundle
    
    subgraph Runtime["Runtime (Pipeline Execution)"]
        Offline["Metadata available<br/>locally in container<br/>(offline support)"]
    end
    
    Bundle --> Offline
    
    style PyPIJob fill:#50c878
    style JavaJob fill:#50c878
    style PyPIStep1 fill:#e8f4f8
    style PyPIStep2 fill:#e8f4f8
    style PyPIStep3 fill:#e8f4f8
    style JavaStep1 fill:#e8f4f8
    style JavaStep2 fill:#e8f4f8
    style JavaStep3 fill:#e8f4f8
    style JavaStep4 fill:#e8f4f8
    style PyPIData fill:#fff3cd
    style JavaData fill:#fff3cd
    style Versioned fill:#fff3cd
    style Bundle fill:#4a90e2
    style Offline fill:#27ae60

Python metadata is scraped weekly from pypi.org through a GitLab pipeline job that:

  • Retrieves all PyPI package names from PMDB
  • Downloads the latest distribution for each package from pypi.org
  • Extracts the mapping between library names and module (import) names
  • Commits the data to the SR-metadata git repository

We intentionally scope this process to the latest version only, as the library-to-module name mapping rarely changes across versions.

Java metadata follows a similar weekly scraping schedule. However, Java Static Reachability requires mappings between library names and import paths, which presents a unique challenge: import paths can vary significantly between package versions, and collecting exhaustive data for all versions creates prohibitive storage requirements. To address this, we focus on two key subsets:

  • Vulnerable packages with known security advisories. Source is the GitLab Advisory DB.
  • The most popular Java packages

This targeted approach covers the majority of customer packages while keeping data manageable. Additionally, we employ a Radix tree structure to optimize storage efficiency, as Java import paths share long common prefixes that compress well with this data structure.

SBOM Enrichment Process

Reachability data are stored in the SBOM report using the gitlab:dependency_scanning_component:reachability property to each component:

{
  "bom-ref": "pkg:pypi/requests@2.28.0",
  "name": "requests",
  "version": "2.28.0",
  "properties": [
    {
      "name": "gitlab:dependency_scanning_component:reachability",
      "value": "in_use"
    }
  ]
}

The reachability property supports two values: in_use and not_found. During SBOM report ingestion, these values are mapped to Yes and Not Found respectively in the reachability field. If the property is absent, it indicates that Static Reachability analysis did not run, and the field displays Not Available. This can happen if static reachability analysis is not enabled or failed to execute or if the language is not supported.

Alternative Solutions

1. Import + Usage Analysis

Approach: Initially, the design included not just detecting if a dependency was imported, but also verifying that the imported library was actually called/used in the application code .

Why We Considered It:

  • More precise than import-only detection
  • Filters out dependencies that are imported but never invoked
  • Reduces false positives by confirming actual code execution

Why We Rejected It:

  • A dependency is imported and automatically some init code is executed.
  • Decided to implement it as part of function level static reachability
  • Simplifying the whole flow

Decision: We scoped the feature to import-level reachability only (is the library imported?). This provides the majority of the value with less complexity.

Key Insight: A dependency that is imported but not called is still a risk if it has a supply chain vulnerability. Import-level detection is sufficient for the primary use case of prioritizing remediation efforts.

Reference: Epic #15780 - Static Reachability Analysis - GA

2. Static Reachability as a Separate Preceding Job

Approach: During the Beta phase, Static Reachability was implemented as a separate CI job that ran before the Dependency Scanning job, with explicit job dependencies.

Why We Considered It:

  • Clear separation of concerns. Static reachability was performed by the SCA-to-sarif-matcher codebase.
  • Easier to debug individual components
  • Allows independent scaling of each job

Why We Rejected It:

  • Job orchestration complexity: Managing dependencies between jobs introduced fragility
  • Operational complexity: Due to heavy CI/CD customer customisations, operational failures could not be avoided.
  • User experience: Multiple jobs made CI/CD templates harder to understand and configure

Decision: We consolidated all functionality into a single Dependency Scanning job that handles:

  1. SBOM generation
  2. Static Reachability analysis (if enabled)
  3. SBOM enrichment

This eliminated job orchestration complexity and provided a cleaner user experience.

Reference: Spike #525958 - Monolithic vs Composable Architecture, Enabling Static Reachability as part of the pipeline

3. Client-Provided Docker Image (Experiment Phase)

Approach: During the Experiment phase, Static Reachability required users to provide a Docker image containing their application code to the SCA-to-sarif-matcher tool for analysis. This was required to find the mapping between library names and import paths/names.

Why We Considered It:

  • Simplified initial implementation
  • Leveraged existing Oxeye/Lightz-aio analysis engine
  • Reduced infrastructure requirements

Why We Rejected It:

  • User friction: Requiring users to provide a Docker image was a significant barrier to adoption

Decision: We implemented a metadata scraping approach where:

  1. We pre-scrape library-to-module mappings from PyPI and Maven Central
  2. Store this metadata in the SR-metadata Git repository
  3. Bundle metadata into the DS analyzer Docker image

Benefits of This Approach:

  • No user-provided Docker images required
  • Cleaner separation between data collection and analysis

Reference: Issue #520509 - Static reachability should be able to run without requiring a docker image

4. Analysis Engine Migration: Lightz-aio to semgrep-core

Approach: Initially, Static Reachability used the Lightz-aio analysis engine to perform cross-file call graph analysis. This required a separate sca-to-sarif-matcher tool for the SBOM enrichment.

Why We Considered It:

  • Leveraged proven proprietary analysis engine

Why We Rejected It:

  • Performance bottleneck: Lightz-aio’s cross-file capabilities made it very slow (40+ minutes for large Python projects like OpenAPI)
  • Limited flexibility: Difficult to integrate into the DS analyzer pipeline without job orchestration

Decision: We migrated to semgrep-core (open-source Semgrep) as the analysis engine and integrated it directly into the DS analyzer. This approach:

  1. Eliminated sca-to-sarif-matcher: Removed the intermediary tool entirely, reducing complexity
  2. Consolidated analysis: All Static Reachability logic now lives in the DS analyzer
  3. Improved performance: Single-file analysis is orders of magnitude faster than cross-file analysis
  4. Simplified integration: No separate job dependencies or tool orchestration needed

Trade-offs:

  • Accepted limitation: We may miss imports where the import statement and usage are in different files (rare edge case)
  • Benefit: Pipelines complete in minutes instead of hours, making the feature practical for real-world CI/CD workflows

Reference: Issue #537156 - Spike: Investigate opengrep/semgrep as an alternative for Lightz-aio

Future Work

Function-Level Static Reachability (FLSR)

Vision: Extend Static Reachability beyond import detection to function-level analysis, determining not just whether a dependency is imported, but whether the specific vulnerable function within that dependency is actually called.

Approach:

  • Track dependency functions are called from the application code
  • Only flag vulnerabilities in packages that are demonstrably used and a vulnerable function is called
  • Offline analyze vulnerable packages with a solution to identify the vulnerable function

Benefits:

  • Precision: Eliminates false positives from unused functions within imported libraries
  • Risk accuracy: Provides the most accurate risk assessment possible. A vulnerable libray doesn’t automatically means that the application code is affected.
  • Remediation prioritization: Enables even more granular prioritization of security work

Scope Limitations: To maintain practical performance and complexity, FLSR will focus on first-layer transitive dependencies only:

  • Direct dependencies: Analyze functions called from the application code
  • First-layer transitive dependencies: Analyze functions called by direct dependencies
  • Second-layer and beyond: Do not analyze deeper transitive chains (dependency of a transitive dependency)

Rationale for Scope:

  • Complexity explosion: Analyzing arbitrarily deep dependency chains creates exponential complexity
  • Performance: Limiting scope keeps analysis time reasonable for CI/CD pipelines
  • Maintainability: Reduces the surface area for bugs and edge cases

Reference: Epic #16844 - Function Level Static Reachability