Data Access Sub Department

Vision

The Data Access sub-department is responsible for the sustainability and availability of access to GitLab’s user data, in alignment with customer needs and GitLab’s business objectives.

The scope of user data includes Git, PostgreSQL, ClickHouse, Redis, Object Storage and the development of a scalable backup system for all GitLab deployments.

For all GitLab deployments:

  1. We design, operate and evolve GitLab’s data storage architecture and interfaces, or provide assistance to those responsible.
  2. We guide feature owners in reaching business goals safely, throughout the feature life cycle.
  3. We aid customers directly in incidents or escalations, and indirectly by innovating to meet their needs.

It is the job of each Data Access team to hold feature owners accountable for responsible access patterns and to thereby ensure the stability of our shared data storage systems. This is an active process and requires building relationships for collaboration, guiding through paved paths, and providing tools and knowledge Team Members can use and build on.

About sustainability and availability

Here, sustainability means long-term maintainability, efficiency, and scalability of our data storage systems and the software architecture that uses it. Good sustainability requires good up-front planning as well as continued adaptation as features, business goals and infrastructure evolve. Effects of new additions and changes must be considered in the context of the entire GitLab application and the storage infrastructure.

Availability means that critical user journeys continuously provide great user experience, during normal operations as well as state transitions such as migrations or upgrades. We must design our architecture, processes and tools such that they minimize interruptions and quality degradations.

Achieving the vision

What we do

  1. Own and drive GitLab’s relevant sustainability goals end to end, holding each other and feature owners accountable.
  2. Measure key metrics of data scalability and access patterns in their services, track their changes and relations to breaking points.
  3. Publish tools to attribute usage of shared resources to their sources (for example, tie growth of a metric or a database query to a given product feature).
  4. Define the “paved paths” (good patterns to follow when storing and accessing data) through documentation, consultation, processes, and frameworks.
  5. Actively detect and prevent non-scalable patterns from entering GitLab as early in the development cycle as possible, through processes and automated tooling.
  6. Drive the work to make existing patterns sustainable, as they are discovered.
  7. Innovate, test, deploy, and migrate to infrastructure changes that contribute to long-term sustainability.
  8. Build defense-in-depth technologies to keep storage services available (like loadshedding, request isolation).
  9. Collaborate with other Data Access teams closely. Share knowledge, ideas, concepts and best practices to foster innovation, and deliver consistent solutions to customers.
  10. Measure the impact of our actions, set targets, and report on progress.

FY26 goals

(in alignment with GitLab’s product principles and the [INTERNAL] Three year (FY26-FY28) - Platforms strategy)

  1. Identify historical storage architecture issues and create a mitigation plan/roadmap (epic).
  2. Establish a framework (automation, processes, information) to ensure scalability of new launches (epic).

Principles for launches

Below is a non-exhaustive list of considerations from the perspective of new features. The responsibility to exercise good judgement remains with the domain experts AND the feature owners. (This description uses terminology from RFC2119.)

Each of these points MUST be considered for all GitLab installation types: Cells, Dedicated, SaaS, and Self-Managed:

  1. Growth of a feature over time MUST NOT endanger the service as a whole.
  2. Failure of a feature MUST NOT endanger the service as a whole.
  3. Safeguards SHOULD be architectural failsafes (isolation, circuit breaker pattern etc), not reactive mechanisms.
  4. The critical path of operating a feature MUST be fully automated. (For example, humans watching graphs and reacting is not allowed.)
  5. Specific observability (monitoring and alerting) MUST be in place, to pinpoint and attribute sources of load and growth.
  6. Data ownership plans MUST include the entire lifecycle of the data, including:
    1. Backup and restoration plans
    2. Data retention policies that are tied to business goals and consider all potential legal implications (such as PII, personally identifying information)
    3. Replication plans (Geo)
    4. Cost analysis
    5. Compatibility with existing data management features, like export and import
  7. Data for any user-facing feature MUST reside, and be accessed through, a data store corresponding to the feature maturity.
    1. For example, a General Availability launch REQUIRES data to be stored in, and accessed through, a production-quality Infrastructure-owned data store.
  8. Changes MUST have documented rollout plans.
  9. The points above MUST be reconsidered each time a feature experiences a lifecycle change (launch, significant growth, change in maturity or scope, sunsetting) before the lifecycle change can take place. The responsibility to revisit belongs to feature owners, aided by Data Access experts.
  10. Exceptions to any of the above MUST be thoroughly and permanently documented with risk assessment and business considerations, and approved by Senior Manager, Data Access or above, and the appropriate Product counterpart(s).

All Team Members

The following people are permanent members of teams that belong to the Data Access Sub-department:

Database Framework

The Database Framework team develops solutions for scalability, application performance, data growth and developer enablement especially where it concerns interactions with the database.

Name Role
Alex IvesAlex Ives Backend Engineering Manager, Database
Backend EngineerBackend Engineer Backend Engineer, Database
Jon JenkinsJon Jenkins Senior Backend Engineer, Database
Krasimir AngelovKrasimir Angelov Staff Backend Engineer, Database
Leonardo da RosaLeonardo da Rosa Backend Engineer, Database
Matt KasaMatt Kasa Staff Backend Engineer, Database
Maxime OreficeMaxime Orefice Senior Backend Engineer, Database
Prabakaran MurugesanPrabakaran Murugesan Senior Backend Engineer, Database
Simon TomlinsonSimon Tomlinson Staff Backend Engineer, Database

Database Operations

The Database Operations team builds, runis, and owns the entire lifecycle of the PostgreSQL database engine for GitLab.com.

Name Role
Rick MarRick Mar Engineering Manager, Database Reliability
Alexander SosnaAlexander Sosna Senior Database Reliability Engineer
Ben PrescottBen Prescott Staff Support Engineer
Biren ShahBiren Shah Senior Database Reliability Engineer
Jon SissonJon Sisson Senior Site Reliability Engineer
Rafael HenchenRafael Henchen Senior Database Reliability Engineer
Zoe BraddockZoe Braddock Site Reliability Engineer

Durability

The Durability team is dedicated to safeguarding and securing customer data that is stored by the GitLab application and set guidelines for data access. We strive to build and maintain resilient infrastructure and improve the management of Redis, Sidekiq, and Gitaly.

Name Role
John 'Jarv' JarvisJohn 'Jarv' Jarvis Staff Site Reliability Engineer
Ahmad SherifAhmad Sherif Senior Site Reliability Engineer
Furhan ShabirFurhan Shabir Senior Site Reliability Engineer
Gabriel MazettoGabriel Mazetto Senior Backend Engineer
Ian BaumIan Baum Senior Backend Engineer
Kyle YetterKyle Yetter Senior Backend Engineer
Gregorius MarcoGregorius Marco Backend Engineer, Scalability
Matt SmileyMatt Smiley Staff Site Reliability Engineer, Scalability
Pravar GaubaPravar Gauba Site Reliability Engineer
Raynard OmongbaleRaynard Omongbale Site Reliability Engineer

Gitaly

The Gitaly team builds and maintains systems to ensure Git data of GitLab instances, and GitLab.com in particular, is reliable, secure and fast.

Name Role
John CaiJohn Cai Engineering Manager, Gitaly
Divya RaniDivya Rani Backend Engineer, Gitaly
Emily ChuiEmily Chui Senior Backend Engineer, Gitaly
Eric JuEric Ju Senior Backend Engineer, Gitaly
James FargherJames Fargher Senior Backend Engineer, Gitaly
James LiuJames Liu Senior Backend Engineer, Gitaly
Mustafa BayarMustafa Bayar Backend Engineer, Gitaly
Olivier CampeauOlivier Campeau Backend Engineer
Quang-Minh NguyenQuang-Minh Nguyen Staff Backend Engineer, Gitaly and Tenant Scale
Sami HiltunenSami Hiltunen Staff Backend Engineer, Gitaly
Tim SchumacherTim Schumacher Backend Engineer, Gitaly

Git

The Git team develops Git in accordance with the goals of the community and GitLab, and integrate it into our products.

Name Role
Christian CouderChristian Couder Staff Backend Engineer, Git
Justin ToblerJustin Tobler Senior Backend Engineer, Git
Karthik NayakKarthik Nayak Senior Backend Engineer, Git
Patrick SteinhardtPatrick Steinhardt Acting Engineering Manager, Git
Toon ClaesToon Claes Senior Backend Engineer, Git

Data Access Durability Team

Mission

The mission of the Durability team is dedicated to safeguarding and securing customer data that is stored by the GitLab application and set guidelines for data access. We strive to build and maintain resilient infrastructure and improve the management of Redis, Sidekiq, and Gitaly.

Ownership

The team has ownership over the following areas of the GitLab product:

  1. Reliable backup and restore solutions for all environments where GitLab is deployed.
  2. Data management and performance for Sidekiq and Redis.
  3. Infrastructure support for the Gitaly service.

Services

Durability is responsible for infrastructure that supports the following GitLab application services:

Database Framework Group

Vision

Developing solutions for scalability, application performance, data growth and developer enablement especially where it concerns interactions with the database.

Mission

Focusing on the database, our mission is to provide solutions that allow us to scale to our customer’s demands. To provide tooling to proactively identify performance bottlenecks to inform developers early in the development lifecycle. To increase the number of database maintainers and provide database best practices to the community contributors and development teams within GitLab.

Database Operations Team (formerly known as the Database Reliablity Engineering (DBRE) team)

Mission

The mission of the Database Operations team at GitLab is to Build, Run, Own and Evolve the entire lifecycle of the PostgreSQL database engine for GitLab.com.

The team is focused on owning the reliability, scalability, performance & security of the database engine and its supporting services. The team should be seeking to build their services on top of Production Engineering::Foundations services and cloud vendor managed products, where appropriate, to reduce complexity, improve efficiency and deliver new capabilities quicker.

Git Team

Mission statement

The Git team is responsible for building, maintaining and providing expertise on the Git version control system. Its main responsibilities include:

  • Upstream development of the Git version control system.
  • Provide expertise to other teams at GitLab.
  • Foster the Git community.
  • Ensure the long-term viability of the Git project.

Upstream development

The Git team is responsible for driving the upstream development of Git both in accordance with the goals of the community and to address GitLab-specific needs as raised by other teams. This falls into the following broad categories:

Gitaly Team

What is Gitaly?

The Gitaly team is responsible for building and maintaining systems to ensure that the Git data storage tier of GitLab instances, and GitLab.com in particular, is reliable, secure and fast. For more information about Gitaly, see the README in the repository and the roadmap below.

The team includes Backend Engineers and SREs collaborating to deliver a reliable, scalable and fast data storage to our customers.

Functional boundary

While GitLab is the primary consumer of the Gitaly project, Gitaly is a standalone product which can be used external to GitLab. As such, we strive to achieve a functional boundary around Gitaly. The goal of this is to ensure that the Gitaly project creates an interface to manage Git data, but does not make business decisions around how to manage the data.