Object Storage Working Group

The GitLab Object Storage Working Group aim is to assist in improving the performance, security, and technical debt of our current object storage solution. Read more!

Attributes

Property Value
Date Created November 3, 2021
Target End Date May 31, 2022
Slack #wg_object-storage (only accessible from within the company)
Google Doc Object Storage Working Group Meeting Agenda (only accessible from within the company)

Charter

GitLab stores three classes of user data: database records, Git repositories, and user uploaded files.

User experience, as well as contributors experience, with our file storage has room for significant improvement.

  • Initial GitLab setup experience requires creation and setup of 13 buckets, instead of just 1.
  • Features using file storage require contributors to think about both local storage and Object Storage which leads to friction and complexity. This often results in broken features and security issues.
  • People contributing to file storage often have to write code also for Workhorse, Omnibus, and CNG.

The working group will be reducing technical debt that has been accrued over the past few years, namely removing CarrierWave and not duplicating object storage clients in both Go and Ruby.

The working group is tasked with architecting a simplified Object Storage process and implimenting the new solution.

Business goal

Improve SaaS scalability, reliability and development speed making sure object storage is available for every type of upload.

Improve feature adoption for self-managed customers, providing a single bucket configuration that works out of the box.

Object storage is a key feature in GitLab that affects engineering groups across all sections. The outcome of the working group should also make it easier for engineers to contribute to the final solution.

Scope and definitions

Object storage is a fundamental component of GitLab, providing the underlying implementation for shared, distributed, highly-available (HA) file storage.

Over time, we have built support for object storage across the application, solving specific problems in multitude of iterations. This has led to increased complexity across the board, from development (new features and bug fixes) to installation:

  • New GitLab installations require the creation and configuration of several object storage buckets instead of just one, as each group of features requires its own. This has an impact on the installation experience and new feature adoption, and takes further away from boring solutions.
  • The realease of cloud native GitLab necessitated the removal of NFS shared storage and the development of direct upload, a feature that was expanded, milestone after milestone, to several type of uploads, but never enabled globally.
  • Today GitLab supports both local storage and object storage. Local storage only works on single box installations or with a NFS, which we no longer recommend to our users and is no longer in use on GitLab.com.
  • Understanding all the moving parts and the flow is extremely complicated: we have CarrierWave, Fog, Golang S3/Azure SDKs, all being used, and that complicates testing as well.
  • Fog and CarrierWave are not maintained to the level of the native SDKs (e.g. AWS S3 SDK), so we end up having to maintain or monkey patch those tools to support requested customer features (e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/242245) that would normally be “free”.
  • In many cases, we copy around object storage files needlessly (e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/285597). Large files (LFS, packages, etc.) are slow to finalize or don’t work at all as a result.

Definitions

CarrierWave

A gem that provides a simple and extremely flexible way to upload files from Ruby applications. This was the boring solution when first implemented. However this is no longer our use-case, as we upload files from Workhorse, and we had to patch CarrierWave’s internals to support Direct Upload.

Direct upload

A technology we developed to intercept file uploads with Workhorse and handle the expensive upload operation in Workhorse, where it’s cheaper. See our uploads development documentation for more details.

Kickoff video

Exit criteria (100%)

The overarching goal should be to define improvements that can be made with the Object Storage implementation , and make informed implementation proposals through the work of this group. As such we intend to:

Out of scope

  • Make final decisions on proposed solutions.
  • Implement all proposed solutions.
  • Be a permanent custodian for or oversee Object Storage development in the future.

Outcome

At the beginning of this working group, we had three main areas of improvement: consolidating object storage files into a single bucket, reducing code complexity, and removing local storage.

However, it took us very little time to figure out that the biggest challenge for the working group members was understanding the current implementation and being able to speak a common language.

The working group led an effort to collect and categorize all the usages of object storage in the product with the result of building a shared understanding of the problem, producing a renewed Uploads Development Guide, and removing features such as Pseudonomyzer and background uploads.

Consolidating object storage files into a single bucket and removing local storage support were assessed by the working group as excellent ways to reduce code complexity and simplify the product installation and maintenance. However, those topics require more significant cross-department decisions that do not fit the working group’s scope. As a first iteration, the working group members addressed how to reduce code complexity by focussing on technological challenges.

The creation of the scalability frameworks team during this working group execution provided a perfect partner to give continuity to this effort. Epic gitlab-com/gl-infra&733 describes the current roadmap.

Roles and responsibilities

The functional leads will be responsible for:

  • Representing the needs of individual stakeholders in their department/sub-dept.
  • Gathering and consolidating feedback on specific proposals from their department/sub-dept.
  • Communicating the output from the working group (if any) and answering questions from their dept/sub-dept.

Ideally, the functional lead is someone who is an IC working in the affected groups, but anyone capable of representing a group, department, or sub-department in the fashion mentioned above is welcome.

Working Group Role Person Stakeholder Dept. Title
Executive Sponsor Marin Jankovski @marin Infrastructure Director of Infrastructure, Platform
Facilitator Alessio Caiazza @nolith Infrastructure Staff Backend Engineer
Functional Lead Grzegorz Bizon @grzesiek Ops, Verify Staff Backend Engineer
Functional Lead Jason Plum @WarheadsSE Distribution Staff Backend Engineer
Functional Lead Matthias Käppler @mkaeppler Memory Senior Backend Engineer
Functional Lead Łukasz Korbasiewicz @lkorbasiewicz Support Support Engineer
Member Vladimir Shushlin @vshushlin Release group Senior Backend Engineer
Member Erick Bajao @iamricecake Verify Senior Backend Engineer
Member Jaime Martinez @jaime Package Backend Engineer
Member David Fernandez @10io Package Senior Backend Engineer
Member Tiger Watson @tigerwnz Configure Senior Backend Engineer
Member Vitor Meireles De Sousa @vdesousa AppSec Senior Application Security Engineer
Member Patrick Bajao @patrickbajao Workhorse Senior Backend Engineer
Member Catalin Irimie @cat Geo Senior Backend Engineer
Member Sofia Vistas @svistas Quality Senior Software Engineer in Test
Member Jacob Vosmaer @jacobvosmaer-gitlab Scalability Staff Backend Engineer

Company efforts on uploads

At GitLab we work in iterations, direct upload was developed by several teams incrementally by adding new features over the course of several milestones.

To demonstrate the number of teams and milestones involved, the timeline of the Object Storage development, from feature development to tech debt and security fixes, is outlined: