Gitaly Team

What is Gitaly?

The Gitaly team is responsible for building and maintaining systems to ensure that the Git data storage tier of GitLab instances, and GitLab.com in particular, is reliable, secure and fast. For more information about Gitaly, see the README in the repository and the roadmap below.

The team includes Backend Engineers and SREs collaborating to deliver a reliable, scalable and fast data storage to our customers.

Functional boundary

While GitLab is the primary consumer of the Gitaly project, Gitaly is a standalone product which can be used external to GitLab. As such, we strive to achieve a functional boundary around Gitaly. The goal of this is to ensure that the Gitaly project creates an interface to manage Git data, but does not make business decisions around how to manage the data.

For example, Gitaly can provide a robust and efficient set of APIs to move Git repositories between storage solutions, but it would be up to the calling application to decide when such moves should occur.

Processes fully independent of business inputs (such as repository maintenance) should be fully contained within Gitaly as they provide substantial value to anyone using the Gitaly project.

Roadmap

Please see the public product direction for Gitaly.

The vision and principles driving the roadmap can be found in the internal handbook.

The current roadmap is this epic board. See Roadmap planning below as to how this is managed.

Featured upcoming large architectural changes

Stable Counterparts

The following members of other functional teams are our stable counterparts:

Name	Role
Ameya Darshan	Security Engineer, Deploy:Environments, Systems:Gitaly, Systems:Geo, Delivery, Snippets
Costel Maxim	Senior Security Engineer, Application Security, Plan (Project Management, Product Planning, Certify), Create:Source Code, Growth, Fulfillment:Purchase, Fulfillment:Provision, Fulfillment:Utilization, Systems:Gitaly
Evan Read	Senior Technical Writer, Software Supply Chain Security:Compliance, Manage:Import and Integrate, Systems:Distribution, Systems:Gitaly
Gerardo Gutierrez	Senior Support Engineer, Systems:Gitaly
John McDonnell	Senior Software Engineer in Test, Systems:Gitaly
Vasilii Iakliushin	Staff Backend Engineer, Create:Source Code, Systems:Gitaly API

How to contact the team

Urgent issues and outages

If you’re not part of the Support organization, please consider seeking help from them first – Support has better availability and can help in most common cases.

If you still need help, please file an issue here. Post it on #g_gitaly for more immediate visibility and tag EM and PM, and the Support person you’re working with, and @tier2-oncall-gitaly to notify the Gitaly team member who is on call.

On Call Rotation

Gitaly on-call should only be paged by the following people:

SRE on-call or IMOC during production incidents only.
Support Engineers or Support Managers during customer emergencies.

Use /incident escalate on Slack for these cases, then select the Gitaly EOC under On-call teams. For all other cases please file an issue under Customer issues.

Please do not page on-call outside of these cases. If you’re working on a customer emergency but not part of Support, please contact Support instead!

Rotation

The incident.io schedule is the source of truth for who is on-call.

The rotation is staffed during working hours of team members (no weekends). This still covers 24h of workdays, given the distribution of team members, but without guarantees.

Weekends are explicitly out of scope (not staffed), and escalation must fall back to the current EOC rotation.
Given that responsibilities are only during working hours, there’s no additional compensation unless explicitly specified otherwise.
You can choose to take time in lieu via Workday, selecting the On-Call Time in Lieu option after a shift.

Expectations during on-call shift

Refer to the Responder Quick Start Guide for a streamlined onboarding process. Note : All escalations to the Gitaly team will be made via incident.io
15 minutes response time to a incident.io page while on-call. This does not apply to pings to the @tier2-oncall-gitaly Slack handle, which should be used to inform the Gitaly on-call of relevant happenings, but should not be used for emergencies.
- The on-call is expected to be available and reachable (but not necessarily actively working, as long as you can start the investigation within this SLO.)
- If paged less than 15 minutes before the end of a shift, you still must respond and explicitly hand off the incident.
Serve as point of contact for questions in the #g_gitaly channel as well as new Request For Help issues.
- Acknowledge inquiries in the #g_gitaly channel on a best-effort basis.
- Triage new Request for Help issues: establish urgency and work with EM/PM to assign a milestone.
Ongoing production incidents and customer escalations are explicitly handed off by the outgoing on-call to the next Gitaly on-call using the incident channel on Slack.
Team members are responsible for finding coverage for PTO and Holidays. Install incident.io mobile application, navigate to Schedules then click on the person icon with arrows to request for cover

Customer issues

Please file an issue here. Post it on #g_gitaly for more immediate visibility.

A note on customer escalations and engagements

We are happy to help when a customer needs it! But please keep in mind that we are primarily a development team, not equipped for “field engineering”.

Our engineers can help, preferably async, with:

deep technical investigation based on data and able technical collaboration, in close partnership with Support and CSM
providing product-level fixes or improvements, work to be scheduled and results released as usual, under direction of EM and PM
improving our documentation if something’s unclear

Engineering Managers (@jcaigitlab) and Product Managers (@mjwood) are also happy to engage with customers if you need assistance clarifying roadmaps, product features and timelines, or to ensure the correct prioritization.

We are not a good fit however if you need:

advice on GitLab instance configuration or architecture in self-hosted scenarios (Reference Architectures and Professional Services can help)
engagements without clear exit criteria (please clarify them first, “let’s jump on a call to discuss” is usually in this category)
long-term “advise us” scenarios (please refer to Support and the documentation, or engage Professional Services)

This epic discusses possible development of this engagement model.

Normal priority requests

To get Gitaly team work on something, it’s best to create an issue on the Gitaly issue tracker and add the group::gitaly and workflow::problem validation labels, along with any other appropriate labels. Then, feel free to tag the relevant Product Manager and/or Engineering Manager as listed above.

For information requests and other quick one-offs, feel free to use #g_gitaly on Slack to get attention on the issue.

Issues with `Infradev` labels

These are typically Corrective Actions or other followup items that have strict SLO tracking. They will be scheduled through either of the above paths, by EM and/or PM polling these dashboards.

Training material

Roster management

Please refer to https://handbook.gitlab.com/handbook/engineering/on-call/#pagerduty for the mechanics (swapping on-call, adding new team members to the rotation).

Team Members

Name	Role
Divya Rani	Backend Engineer, Gitaly
Emily Chui	Senior Backend Engineer, Gitaly
Eric Ju	Senior Backend Engineer, Gitaly
James Liu	Senior Backend Engineer, Gitaly
John Cai	Engineering Manager, Gitaly
Mustafa Bayar	Backend Engineer, Gitaly
Olivier Campeau	Backend Engineer, Gitaly
Quang-Minh Nguyen	Staff Backend Engineer, Gitaly and Tenant Scale
Sami Hiltunen	Staff Backend Engineer, Gitaly
Timothy Schumacher	Backend Engineer, Gitaly

Working with product

Agile workflow in Gitaly

We generally follow the Product Development Flow to schedule and track our work.

Work is executed in small chunks (2-3 days of work), each tracked as an issue. This allows for natural “checkpoints” for safe context switching. Triaging and scheduling is separate from executing the current work. All incoming work is tracked and we are intentional about picking up new work.

Incoming work of all kind (both projects and ad-hoc interrupts) passes by EM and PM for triage. There may be some engineering consultation here about feasibility, fit with the product’s strategy roadmap etc. Some will get scheduled, some goes to the backlog. If the effort is not deemed necessary or not believed to align with the roadmap, we will close the issue with commentary as to why it is not being pursued for future reference.

We aim to scope milestones such that we have a task list that is ambitious, but not overwhelming. We deliberately leave some capacity for incoming incidents. We want to avoid the feeling of a never ending mountain of work to promote a healthy work / life balance. It is also important to stress that milestones are recommendations only and we work on a best effort basis.

For issues with a strict SLO, we follow the process defined below

We use the following workflow labels on the issues:

workflow::problem validation - A good spot to put features that we may / may not want to pursue. This is where product can do some user interviews, cost analysis, market fit, etc to decide if it’s an opportunity we wish to pursue.
workflow::solution validation - Use this label for features / issues where Engineering needs to investigate / propose a solution going forward, or break it down into smaller issues.
workflow::planning breakdown - Issues ready to be scheduled in the next few milestones (unblocked or soon unblocked, with a known solution). Leaders of long-running (pre-approved) projects use this to communicate with PM.
workflow::ready for development - Work that is scheduled for a milestone (either the current one, or one in the future).
workflow::in dev - Actively being worked by the Engineering team
workflow::in review - Work that is in review
workflow::verification - code is in production and pending verification by the DRI engineer
workflow::complete - changes are verified, issue can be closed

Issues that we definitely want to prioritize for a release receive a Deliverable label and are moved to the top of the list. These Deliverable issues help show our commitment to GitLab and our customers around working on these issues.

Workflow

Project Work

The top level Gitaly epic contains linked epics representing projects the team is working on. Team members will either be the primary owner of an epic, or a supporting contributor. This way knowledge gets shared across the team.

DRI & Supporting contributors

The DRI of an epic will be responsible for making decisions regarding technical direction of a project. Making a decision will involve creating proposals and gathering feedback from peers and the Engineering Manager. It also involves reaching out and collaborating with stakeholders external to the team when applicable.

The DRI is also responsible for project management, which means keeping the epic up to date with relevant issues, removing issues that are no longer relevant, and writing weekly updates the automatically generated comment in the epic with the following format:

HIGH_LEVEL_SUMMARY

:tada: **achievements**:
-

:issue-blocked: **blockers**:
-

:arrow_forward: **next**:
-

The supporting contributor(s) of an epic will be responsible for supporting the DRI in working on issues, reviewing MRs, and participating in technical discussions. The secondary owner can also act as the primary owner when the DRI is OOO, depending on their bandwidth.

Supporting contributors are highly recommended but optional. There can also be multiple secondary owners for a project.

Not everyone needs to be a DRI, but everyone should be a supporting contributor on at least one project.

The structure of having both DRIs and supporting contributors does not introduce any hard requirements for moving MRs forward, as reviews and approvals can be done by anyone on the team.

Technical Roadmap, Customer Issues, and Cross Functional Issues

The Gitaly Technical Roadmap & Customer Issues board contain one-off issues that are not a part of any projects, but are important issues to address. These include technical roadmap issues, customer issues, and cross functional work in Gitaly that other teams rely on. These issues will be sorted by priority. Team members can pick up work from this board in addition to issues they are working on as part of project epics.

As a rule of thumb, the ratio of project work to technical roadmap, customer issues should be roughly 70/30.

Urgent and high priority issues

P1/S1 issues should be treated with urgency. If such issues have not been scheduled, bias for action is encouraged. Go ahead and pull them into the current milestone, but do notify the EM and PM.

Blocked issues

If your work is blocked, use workflow::blocked and set a blocking issue for clarity. Then consider asking for help and/or helping to unblock another team member’s blocked work before picking up something else.

Issues blocked for a long time should be removed from this process by removing the milestone and unassigning.

Adding more work for the team

Everyone can file new issues as more work is discovered, and feed them into this process. To do so, file an issue, tag EM and PM, and assign workflow::planning breakdown without a milestone. Please explain both what needs to be done and why (ie the impact and urgency), and make it clear whether the work is ready to be picked up. (This is also how project DRIs add the next steps in their projects to the workflow.)

Roadmap planning

The current roadmap is this epic board. It consists of themes/projects running for a quarter or longer (in some cases, much longer). It is okay to add sub-projects directly to the roadmap in the latter case.

Anyone can propose a project: file an epic and discuss with the team (and EM+PM). Don’t forget the group::gitaly label.
Once accepted, we add the Roadmap label.
Ongoing roadmap items get roadmap::now, while roadmap::next and roadmap::later show what’s been triaged and pushed into the future for now.
At each quarterly planning:
- we review roadmap items (using arguments from the vision and principles, current business priorities etc)
- and then take on OKRs that push those goals forward.

Quarterly Planning

Quarterly planning is done before every quarter for the next 3 milestones, with input from everyone. At that time, we must already have a good idea of the work that needs to be done.

The process is as follows:

EM+PM (with input from engineers and stakeholders): decide the scope we’ll be working on, which will align with department level OKRs.
EM+PM+Engineers: Based on roadmap items, file smaller epics/issues if needed that can be completed in 3 milestones (ie one quarter). Tie them to the overall project epics. This is where we’ll track the actual work.
EM: Modify the top level Gitaly epic to reflect the work.
PM: Once the scope of the quarter is clear, take the list of issues and assign one of the three milestones, along with workflow::planning breakdown (for large issues in need of breakdown) or workflow::ready for development.
Engineers: help break down workflow::planning breakdown items and file smaller issues if needed, adding them to the same 3 milestones as reasonable. Raise exceptions as needed.

Handling issues with strict SLO

Issues with Infradev label are typically Corrective Actions or other followup items that have strict SLO tracking. They will be scheduled through either of the above paths, by EM and/or PM polling these dashboards:

Infradev Dashboard Past due Infradev issues

EM+PM: Poll the dashboards at least weekly. Triage and schedule these issues so that SLOs can be met. If needed, move the issue to the Gitaly tracker, or file a proxy issue there so that it shows up on work boards, and mark it as blocking. Drag issues to the top of the workflow::ready for development column.
EM+PM: If the issue is blocked or depends on ongoing work, add a Milestone that fits the SLO and the pending work (so that we don’t forget it). Ensure that blocking work gets scheduled before.
Engineers: please prioritize picking up this work, and post frequent (at most weekly, even if no changes) updates in the original issue. Mark any blocking issues as such.

Gitaly consumers

To have a constant communication flow about planned changes, updates and maybe breaking changes we have the #g_gitaly Slack channel. In the channel we will provide updates for all teams using the service but also ask for assistance to provide feedback and insights about planned changes or improvements.

To support this pro-active communication additionally there is also an individual counterpart on the consumer side to help with research in the codebases and coordination with all the teams consuming Gitaly. The DRI on Consumer side is Igor Drozdov.

The Gitaly consumers are:

Gitaly Deprecations

Gitaly offers many customer facing features. As such, all deprecations to customer facing features will follow the standard GitLab feature deprecations guidance and announced within the deprecations documentation page.

Gitaly also offers many non-customer facing features, which are used by GitLab and other customers who directly interface with Gitaly. These Gitaly level deprecations will not be announced using the above methods as they are not designed for GitLab end users to interface with directly. Some examples of these non-customer facing features are storage level APIs, which should never be called by GitLab users.

Metrics

On gitlab.com

Incidents (not all pages are incidents)
Pages
Global Apdex
Alerts (S1/S2 are paging, S3/S4 are not)

Useful links

Debugging Gitaly
Actual pending Infradev issues (sort by group, focus on gitaly)
Out of SLO Infradev issues
Error budget
MR review workload

Dashboards

Team development

Onboarding

To complete team-specific onboarding, please file an issue here.

Offboarding

Maintainer rights are revoked, and to remove the developer from the list of authorized approvers, remove them from the gl-gitaly GitLab.com group.

Debugging the Gitaly service

About this document This document is intended for Gitaly engineers, to become familiar with …

Last modified July 2, 2025: Add product dev folder and move relevant pages (83bfc789)

View page source - Edit this page - please contribute.