Reduce the growth rate of pipeline data

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
ongoing fabiopitino mbobin fabiopitino grzesiek jreporter cheryl.li devops verify 2024-05-27

Problem to solve

TODO

Strategies

Delete pipeline processing data

Once a build gets archived, it is no longer possible to resume pipeline processing in such pipeline. It means that all the metadata, we store in PostgreSQL, that is needed to efficiently and reliably process builds can be safely moved to a different data store.

Storing pipeline processing data is expensive as this kind of CI/CD data represents a significant portion of data stored in CI/CD tables. Once we restrict access to processing archived pipelines, we can move this metadata to a different place - preferably object storage - and make it accessible on demand, when it is really needed again (for example for compliance or auditing purposes).

We need to evaluate whether moving data is the most optimal solution. We might be able to use de-duplication of metadata entries and other normalization strategies to consume less storage while retaining ability to query this dataset. Technical evaluation will be required to find the best solution here.

Epic: Reduce the rate of builds metadata table growth.

Last modified November 1, 2024: Remove trailing spaces (6f6d0996)