Secret Detection as a platform-wide experience

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
ongoing theoretick vbhat161 ahmed.hemdan theoretick connorgilbert amarpatel devops secure 2022-11-25

Summary

Today’s secret detection feature is built around containerized scans of repositories within a pipeline context. This feature is quite limited compared to where leaks or compromised tokens may appear and should be expanded to include a much wider scope.

Secret detection as a platform-wide experience encompasses detection across platform features with high risk of secret leakage, including repository contents, job logs, and project management features such as issues, epics, and MRs.

Motivation

Goals

  • Support platform-wide detection of tokens to avoid secret leaks
  • Prevent exposure by rejecting detected secrets
  • Provide scalable means of detection without harming end user experience
  • Unified list of token patterns and masking

See target types for scan target priorities.

Non-Goals

Phase1 is limited to detection and alerting across platform, with rejection only during prereceive Git interactions and browser-based detection.

Secret revocation and rotation is also beyond the scope of this new capability.

Scanned object types beyond the scope of this MVC are included within target types.

Management UI

Development of an independent interface for managing secrets is out of scope for this blueprint. Any detections will be managed using the existing Vulnerability Management UI.

Management of detected secrets will remain distinct from the Secret Management feature capability as “detected” secrets are categorically distinct from actively “managed” secrets. When a detected secret is identified, it has already been compromised due to their presence in the target object (that is a repository). Alternatively, managed secrets should be stored with stricter standards for secure storage, including encryption and masking when visible (such as job logs or in the UI).

As a long-term priority we should consider unifying the management of the two secret types however that work is out of scope for the current blueprints goals, which remain focused on active detection.

Target types

Target object types refer to the scanning targets prioritized for detection of leaked secrets.

In order of priority this includes:

  1. non-binary Git blobs under 1 megabyte
  2. job logs
  3. issuable creation (issues, MRs, epics)
  4. issuable updates (issues, MRs, epics)
  5. issuable comments (issues, MRs, epics)

Targets out of scope for the initial phases include:

  • non-binary Git blobs over 1 megabyte
  • binary Git blobs
  • Media types (JPEG, PDF, …)
  • Snippets
  • Wikis
  • Container images
  • External media (Youtube platform videos)

Token types

The existing Secret Detection configuration covers 100+ rules across a variety of platforms. To reduce total cost of execution and likelihood of false positives the dedicated service targets only well-defined, low-FP tokens.

Token types to identify in order of importance:

  1. Well-defined GitLab tokens (including Personal Access Tokens and Pipeline Trigger Tokens)
  2. Verified Partner tokens (including AWS)
  3. Well-defined low-FP third party tokens
  4. Remainder tokens currently included in Secret Detection analyzer configuration

A well-defined token is a token with a precise definition, most often a fixed substring prefix (or suffix) and fixed length.

For GitLab and partner tokens, we have good domain understanding of our own tokens and by collaborating with partners verified the accuracy of their provided patterns.

An observed low-FP token relies on user reports and dismissal reports. With delivery of this data issue we will have aggregates on FP-rates but primarily this is user-reported data, at present.

In order to minimize false positives, there are no plans to introduce or alert on high-entropy, arbitrary strings; i.e. patterns such as 3lsjkw3a22.

Uniformity of rule configuration

Rule pattern configuration should remain centralized in the secrets analyzer’s packaged gitleaks.toml configuration, vendored to the monolith for Phase 1, and checksum-checked to ensure it matches the specific release version to avoid drift. Each token can be filtered by tags to form both high-confidence and blocking groupings. For example:

prereceive_blocking_rules = toml.load_file('gitleaks.toml')['rules'].select do |r|
  r.tags.include?('gitlab_blocking_p1') &&
    r.tags.include?('gitlab_blocking')
end

Auditability

A critical aspect of both secret detection and suppression is administrative visibility. With each phase we must include audit capabilities (events or logging) to enable event discovery.

Proposal

The first iteration of the experimental capability will feature a blocking pre-receive hook implemented in the Rails application. This iteration will be released in an experimental state to select users and provide opportunity for the team to profile the capability before considering extraction into a dedicated service.

In the future state, to achieve scalable secret detection for a variety of domain objects a dedicated scanning service must be created and deployed alongside the GitLab distribution. This is referred to as the SecretScanningService.

This service must be:

  • highly performant
  • horizontally scalable
  • generic in domain object scanning capability

Platform-wide secret detection should be enabled by-default on GitLab SaaS as well as self-managed instances.

Decisions

Challenges

  • Secure authentication to GitLab.com infrastructure
  • Performance of scanning against large blobs
  • Performance of scanning against volume of domain objects (such as push frequency)
  • Queueing of scan requests

Transfer optimizations for large Git data blobs

As described in Gitaly’s upload-pack traffic blueprint, we have faced problems in the past handling large data transfers over gRPC. This could be a concern as we expand secret detection to large blob sizes to increase coverage over leaked secrets. We expect to rollout pre-receive scanning with a 1 megabyte blob size limit which should be well within boundaries. From Protobuffers’ documentation:

As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

In expansion phases we must explore chunking or alternative strategies like the optimized sidechannel approach used by Gitaly.

Design and implementation details

The detection capability relies on a multiphase rollout, from an experimental component implemented directly in the monolith to a standalone service capable of scanning text blobs generically.

The implementation of the secret scanning service is highly dependent on the outcomes of our benchmarking and capacity planning against both GitLab.com and Reference Architectures. As the scanning capability must be an on-by-default component of both our SaaS and self-managed instances, each iteration’s deployment characteristic defines whether the service will act as a standalone component, or executed as a subprocess of the Rails architecture (as mirrors the implementation of our Elasticsearch indexing service).

See technical discovery for further background exploration.

See this thread for past discussion around scaling approaches.

Detection engine

Our current secret detection offering uses Gitleaks for all secret scanning in pipeline contexts. By using its --no-git configuration we can scan arbitrary text blobs outside of a repository context and continue to use it for non-pipeline scanning.

Changes to the detection engine are out of scope until benchmarking unveils performance concerns.

For the long-term direction of GitLab Secret Detection, the scope is greater than that of the Gitleaks tool. As such, we should consider feature encapsulation to limit the Gitleaks domain to the relevant build context only.

In the case of pre-receive detection, we rely on a combination of keyword/substring matches for pre-filtering and re2 for regex detections. See spike issue for initial benchmarks.

Notable alternatives include high-performance regex engines such as Hyperscan or it’s portable fork Vectorscan. These systems may be worth exploring in the future if our performance characteristics show a need to grow beyond the existing stack, however the team’s velocity in building an independently scalable and generic scanning engine was prioritized, see ADR 001 for more on the implementation language considerations.

Organization-level Controls

Configuration and workflows should be oriented around Organizations. Detection controls and governance patterns should support configuration across multiple projects and groups in a uniform way that emphasizes shared allowlists, organization-wide policies (i.e. disablement of push option bypass), and auditability.

Each phase documents the paradigm used as we iterate from Instance-level to Organization-level controls.

Phase 1 - Ruby pushcheck pre-receive integration

The critical paths as outlined under goals above cover two major object types: Git text blobs (corresponding to push events) and arbitrary text blobs. In Phase 1, we focus entirely on Git text blobs.

The detection flow for push events relies on subscribing to the PreReceive hook to scan commit data using the PushCheck interface. This SecretScanningService service fetches the specified blob contents from Gitaly, scans the commit contents, and rejects the push when a secret is detected. See Push event detection flow for sequence.

In the case of a push detection, the commit is rejected inline and error returned to the end user.

Configuration

This phase will be considered “experimental” with limited availability for customer opt-in, through instance level application settings.

High-Level Architecture

The Phase 1 architecture involves no additional components and is entirely encapsulated in the Rails application server. This provides a rapid deployment with tight integration within auth boundaries and no distribution coordination.

The primary drawback relies on resource utilization, adding additional CPU, memory, transfer volume, and request latency to existing application nodes.

@startuml Phase2
skinparam linetype ortho

card "**External Load Balancer**" as elb #6a9be7

together {
  card "**GitLab Rails**" as gitlab #32CD32
  card "**Gitaly**" as gitaly #FF8C00
  card "**PostgreSQL**" as postgres #4EA7FF
  card "**Redis**" as redis #FF6347
  card "**Sidekiq**" as sidekiq #ff8dd1
}
}

gitlab -[#32CD32]--> gitaly
gitlab -[#32CD32]--> postgres
gitlab -[#32CD32]--> redis
gitlab -[#32CD32]--> sidekiq

elb -[#6a9be7]-> gitlab

gitlab .[#32CD32]----> postgres
sidekiq .[#ff8dd1]----> postgres

@enduml

Push Event Detection Flow

sequenceDiagram
    autonumber
    actor User
    User->>+Workhorse: git push with-secret
    Workhorse->>+Gitaly: tcp
    Gitaly->>+Rails: PreReceive
    Rails->>-Gitaly: ListAllBlobs
    Gitaly->>-Rails: ListAllBlobsResponse

    Rails->>+GitLabSecretDetection: Scan(blob)
    GitLabSecretDetection->>-Rails: found

    Rails->>User: rejected: secret found

    User->>+Workhorse: git push without-secret
    Workhorse->>+Gitaly: tcp
    Gitaly->>+Rails: PreReceive
    Rails->>-Gitaly: ListAllBlobs
    Gitaly->>-Rails: ListAllBlobsResponse

    Rails->>+GitLabSecretDetection: Scan(blob)
    GitLabSecretDetection->>-Rails: not_found

    Rails->>User: accepted

Gem Scanning Interface

For the Phase1, we use the private Secret Detection Ruby Gem that is invoked by the Secrets Push Check on the GitLab Rails platform.

The private SD gem offers the following support in addition to running scan on multiple blobs:

  • Configurable Timeout on the entire scan-level and on each blob level.

  • Ability to run the scan within subprocess instead of the main process. The number of processes spawned per request is capped to 5.

The Ruleset file referred during the Pre-receive Secret Detection scan is located here.

More details about the Gem can be found in the README file. Also see ADR 002 for more on how the Gem code is stored and distributed.

Phase 2 - Standalone Secret Detection service

This phase emphasizes scaling the service outside of the monolith for general availability, isolating feature’s resource consumption, and ease of maintainability. The critical paths as outlined under goals above cover two major object types: Git text blobs (corresponding to push events) and arbitrary text blobs. In Phase 2, we continue to focus on Git text blobs.

The responsibility of the service will be limited to running Secret Detection scan on the given set of input blobs. More details about the service are outlined in ADR 004: Secret Detection Scanner Service.

The introduction of a dedicated service impacts the workflow for Secret Push Protection as follows:

sequenceDiagram
    autonumber
    %% Phase 2: Iter 1
    Gitaly->>+Rails: invokes `/internal/allowed` API endpoint
    Rails->>Rails: Perform project eligibility checks
    alt On access check failure
        Rails-->>Gitaly: Scanning Skipped
    end
    Rails->>Gitaly: Fetch blobs
    Gitaly->>Rails: Quarantined Blobs
    Rails->>Secret Detection Service: Invoke scan by embedding blobs
    Secret Detection Service->>Secret Detection Service: Runs Secret Detection on input blobs
    Secret Detection Service->>Rails: Result
    Rails->>Gitaly: Result

The Secret Detection service addresses the previous phase’s limitations of feature scalability and shared-resource consumption. However, the Secret Push Protection workflow still requires Rails monolith to load large amount of Git blobs fetched from Gitaly into its own memory before passing it down to the Secret Detection Service.

Phase 2.1 - Invoke Push Protection directly from Gitaly

Until the previous phase, there are multiple hops made between Gitaly and Rails for running Pre-receive checks, particularly for Secret Push protection so a fairly large amount of Rails memory is occupied for holding Git blobs to pass them to the Gem/Service for running secret scan. This problem can be mitigated through a direct interaction between the Secret Detection service and Gitaly via standard interface (either Custom pre-receive hook or Gitaly’s new Plugin-based architecture). This setup skips the need for Rails to be a blob messenger between Gitaly and Service.

Gitaly’s new Plugin-based architecture is the preferred interface for interacting b/w Gitaly and RPC service as it provides streamlined access to the Git blob repository. However, Gitaly team is yet to take it up for development.

More details on Phase 2.1 will be added once there are updates on the development of Plugin architecture.

Phase 3 - Expansion beyond Push Protection service

The detection flow for arbitrary text blobs, such as issue comments, relies on subscribing to Notes::PostProcessService (or equivalent service) to enqueue Sidekiq requests to the SecretScanningService to process the text blob by object type and primary key of domain object. The SecretScanningService service fetches the relevant text blob, scans the contents, and notifies the Rails application when a secret is detected.

The detection flow for job logs requires processing the log during archive to object storage. See discussion in this issue around scanning during streaming and the added complexity in buffering lookbacks for arbitrary trace chunks.

In the case of a push detection, the commit is rejected and error returned to the end user. In any other case of detection, the Rails application manually creates a vulnerability using the Vulnerabilities::ManuallyCreateService to surface the finding in the existing Vulnerability Management UI.

Configuration

This phase will be considered “generally available” and on-by-default, with disablement configuration through organization-level settings.

High-Level Architecture

There is no change to the architecture defined in Phase 2, however the individual load requirements may require scaling up the node counts for the detection service.

Push Event Detection Flow

There is no change to the push event detection flow defined in Phase 2, however the added capability to scan arbitrary text blobs directly from Rails allows us to emulate a pre-receive behavior for issuable creations, as well (see target types for priority object types).

sequenceDiagram
    autonumber
    actor User
    User->>+Workhorse: git push with-secret
    Workhorse->>+Gitaly: tcp
    Gitaly->>+GitLabSecretDetection: PreReceive
    GitLabSecretDetection->>-Gitaly: ListAllBlobs
    Gitaly->>-GitLabSecretDetection: ListAllBlobsResponse

    Gitaly->>+GitLabSecretDetection: PreReceive

    GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
    GitLabSecretDetection->>-Gitaly: found

    Gitaly->>+Rails: PreReceive

    Rails->>User: rejected: secret found

    User->>+Workhorse: POST issuable with-secret
    Workhorse->>+Rails: tcp
    Rails->>+GitLabSecretDetection: PreReceive

    GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
    GitLabSecretDetection->>-Rails: found

    Rails->>User: rejected: secret found

Future Phases

These are key items for delivering a feature-complete always-on experience but have not have yet been prioritized into phases.

Large blob sizes (1mb+)

Current phases do not include expansions of blob sizes beyond 1mb. While the main limitation was chosen to conform to RPC transfer limits for future iterations we should expand to supporting additional blob sizes. This can be achieved in two ways:

  1. Post-receive processing

    Accept blobs in a non-blocking fashion, process scanning as background job and alert passively on detection of a given secret.

  2. Improvements to scanning logic batching

    Maintaining the constraint of 1MB is primarily futureproofing to match an expected transport protocol. This can be mitigated by using separate transport (http, reads from disk, …) or by slicing blob sizes.

Detection Suppression

Suppression of detection and action on leaked secrets will be supported at several levels.

  1. Global suppression - If a secret is highly-likely to be a false token (i.e. EXAMPLE) it should be suppressed in workflow contexts where user would be seriously inconvenienced.

    We should still provide some means of triaging these results, whether via audit events or as automatic vulnerability resolution.

  2. Organization suppression - If a secret matches an organization’s allowlist (or was previously flagged and remediated as irrelevant) it should not reoccur. See Organization-level controls.

  3. Inline suppression - Inline annotations should be supported in later phases with the Organization-level configuration to ignore annotations.

External Token Verification

As a post-processing step for detection we should explore verification of detected secrets. This requires processors per supported token type in which we can distinguish tokens that are valid leaks from false positives. Similar to our automatic response to leaked secrets, we must externally verify a given token to give a high degree of confidence in our alerting.

There are two token types: internal and external:

  • Internal tokens are verifiable and revocable as part of ScanSecurityReportSecretsWorker worker
  • External tokens require external verification, in which the architecture will closely match the Secret Revocation Service

Iterations


GitLab Secret Detection ADR 001: Use Ruby Push Check approach within monolith

Context

There are a number of concerns around the performance of secret detection using a regex-based approach at scale. The primary considerations include transfer latency between nodes and both CPU and memory bloat. These concerns manifested in two ways: the language to be used for performing regex matching and the deployment architecture.

The original discussion in the exploration issue covers many of these concerns and background.

Implementation language

The two primary languages considered were Ruby and Go.

GitLab Secret Detection ADR 002: Store the Secret Detection Gem in the same repository

Context

During Phase 1, we opted for using the Ruby-based push check approach to block secrets from being committed to a repository, and as such the scanning of secrets was performed by a library (or a Ruby gem) developed internally within GitLab for this specific purpose.

Part of the process to create this library and make it available for use within the Rails monolith, we had to make a decision on the best way to distribute the library.

GitLab Secret Detection ADR 003: Run scan within subprocess

Context

During the spike conducted for evaluating regex for Pre-receive Secret Detection, Ruby using RE2 library came out on the top of the list. Although Ruby has an acceptable regex performance, its language limitations have certain pitfalls like more memory consumption and lack of parallelism despite the language supporting multi-threading and Ractors (3.1+) as they are suitable for running I/O-bound operations in parallel but not CPU-bound operations.

One of the concerns running the Pre-receive Secret Detection feature in the critical path is memory consumption, especially by the regex operations involved in the scan. In a scan with 300+ regex-based rule patterns running on every line of the commit blobs, the memory could go up to ~2-3x the size of the commit blobs1. The occupied memory is not released despite scan operation being complete, until the Garbage Collector triggers. Eventually, the servers might choke on the memory.

GitLab Secret Detection ADR 004: Secret Detection Scanner Service

Context

In the phase 2 of Secret Push Protection, the goal is to have a dedicated service responsible for running Secret Detection scans on the given input blobs. This is done primarily from the scalability standpoint. Regex operations in the Secret Detection scan consume high resources so running scans within Rails or Gitaly instances would impact the resource availability for running other operations. Running scans in isolation provides greater control over resource allocation and scaling the service independently as needed.

GitLab Secret Detection ADR 005: Use Runway for service deployment

Context

The Secret Detection Service requires a strategy for running automated deployments via GitLab CI environment.

Proposed Solution: Runway

We could use Runway - a GitLab internal Platform as a Service, which aims to enable teams to deploy and run their services quickly and safely.

Platform Tooling Support