The Gitaly team is responsible for building and maintaining systems to ensure
that the Git data storage tier of GitLab instances, and GitLab.com in particular,
is reliable, secure and fast. For more information about Gitaly, see the README in the repository and the roadmap below.
The team includes Backend Engineers and SREs collaborating to deliver a reliable, scalable and fast data storage to our customers.
Functional boundary
While GitLab is the primary consumer of the Gitaly project, Gitaly is a standalone product which can be used external to GitLab. As such, we strive to achieve a functional boundary around Gitaly. The goal of this is to ensure that the Gitaly project creates an interface to manage Git data, but does not make business decisions around how to manage the data.
For example, Gitaly can provide a robust and efficient set of APIs to move Git repositories between storage solutions, but it would be up to the calling application to decide when such moves should occur.
Processes fully independent of business inputs (such as repository maintenance) should be fully contained within Gitaly as they provide substantial value to anyone using the Gitaly project.
If you’re not part of the Support organization, please consider seeking help from them first – Support has better availability and can help in most common cases.
If you still need help, please file an issue here. Post it on #g_gitaly for more immediate visibility and tag EM and PM, and the Support person you’re working with.
On Call Rotation
Gitaly on-call should only be paged by the following people:
SRE on-call or IMOC during production incidents only.
Support Engineers or Support Managers during customer emergencies.
For these cases, use /pd trigger on Slack, then select the Gitaly rotation.
For all other cases please file an issue under Customer issues.
Please do not page on-call outside of these cases. If you’re working on a
customer emergency but not part of Support, please contact Support instead!
The rotation is staffed during working hours of team members (no weekends). This still covers 24h of workdays, given the distribution of team members, but without guarantees.
Weekends are explicitly out of scope (not staffed), and escalation must fall back to the current EOC rotation.
Given that responsibilities are only during working hours, there’s no additional compensation unless explicitly specified otherwise.
You can choose to take time in lieu via Workday, selecting the On-Call Time in Lieu option after a shift.
Expectations for On-call
Provide technical assistance for ONLY the cases described above
15 minutes response time to a PagerDuty page while
on-call. This does not apply to pings to the @gitaly-oncall Slack handle,
which should be used to inform the Gitaly on-call of relevant happenings, but
should not be used for emergencies.
The on-call is expected to be available and reachable (but not necessarily actively working, as long as you can start the investigation within this SLO.)
If paged less than 15 minutes before the end of a shift, you still must respond and explicitly hand off the incident.
Ongoing production incidents and customer escalations are explicitly handed off by the outgoing on-call to the next Gitaly on-call using the incident channel on Slack.
Team members are responsible for finding coverage for PTO and Holidays.
Customer issues
Please file an issue here. Post it on #g_gitaly for more immediate visibility.
A note on customer escalations and engagements
We are happy to help when a customer needs it! But please keep in mind that we are primarily a development team, not equipped for “field engineering”.
deep technical investigation based on data and able technical collaboration, in close partnership with Support and CSM
providing product-level fixes or improvements, work to be scheduled and results released as usual, under direction of EM and PM
improving our documentation if something’s unclear
Engineering Managers (@jcaigitlab) and Product Managers (@mjwood) are also happy to engage with customers if you need assistance clarifying roadmaps, product features and timelines, or to ensure the correct prioritization.
We are not a good fit however if you need:
advice on GitLab instance configuration or architecture in self-hosted scenarios (Reference Architectures and Professional Services can help)
engagements without clear exit criteria (please clarify them first, “let’s jump on a call to discuss” is usually in this category)
long-term “advise us” scenarios (please refer to Support and the documentation, or engage Professional Services)
This epic discusses possible development of this engagement model.
Normal priority requests
To get Gitaly team work on something, it’s best to create an issue on the Gitaly issue tracker
and add the group::gitaly and workflow::problem validation labels,
along with any other appropriate labels. Then, feel free to tag the relevant
Product Manager and/or Engineering Manager as listed above.
For information requests and other quick one-offs, feel free to use #g_gitaly on Slack to get attention on the issue.
Work is executed in small chunks (2-3 days of work), each tracked as an issue. This allows for natural “checkpoints” for safe context switching.
Triaging and scheduling is separate from executing the current work. All incoming work is tracked and we are intentional about picking up new work.
Incoming work of all kind (both projects and ad-hoc interrupts) passes by EM and PM for triage. There may be some engineering consultation here about feasibility,
fit with the product’s strategy roadmap etc. Some will get scheduled, some goes to the backlog. If the effort is not deemed necessary or not believed
to align with the roadmap, we will close the issue with commentary as to why it is not being pursued for future reference.
We aim to scope milestones such that we have a task list that is ambitious, but not overwhelming. We deliberatly leave some capacity for incoming incidents.
We want to avoid the feeling of a never ending mountain of work to promote a healthy work / life balance.
It is also important to stress that milestones are recommendations only and we work on a best effort basis.
For issues with a strict SLO, we follow the process defined below
We use the following workflow labels on the issues:
workflow::problem validation - A good spot to put features that we may / may not want to pursue. This is where product can do some user interviews, cost analysis, market fit, etc to decide if it’s an opportunity we wish to pursue.
workflow::solution validation - Use this label for features / issues where Engineering needs to investigate / propose a solution going forward, or break it down into smaller issues.
workflow::planning breakdown - Issues ready to be scheduled in the next few milestones (unblocked or soon unblocked, with a known solution). Leaders of long-running (pre-approved) projects use this to communicate with PM.
workflow::ready for development - Work that is scheduled for a milestone (either the current one, or one in the future).
workflow::in dev - Actively being worked by the Engineering team
workflow::in review - Work that is in review
workflow::verification - code is in production and pending verification by the DRI engineer
workflow::complete - changes are verified, issue can be closed
Issues that we definitely want to prioritize for a release receive a Deliverable label and are moved to the top of the list.
These Deliverable issues help show our commitment to GitLab and our customers around working on these issues.
Workflow
Project Work
The top level Gitaly epic
contains linked epics representing projects the team is working on. Team members
will either be the primary owner of an epic, or
a supporting contributor. This way knowledge gets
shared across the team.
DRI & Supporting contributors
The DRI of an epic
will be responsible for making decisions
regarding technical direction of a project. Making a decision will involve
creating proposals and gathering feedback from peers and the Engineering
Manager. It also involves reaching out and collaborating with stakeholders
external to the team when applicable.
The DRI is also responsible for project management, which means
keeping the epic up to date with relevant issues, removing issues that are no
longer relevant, and writing weekly updates the automatically generated comment
in the epic with the following format:
The supporting contributor(s) of an epic will be responsible for supporting the
DRI in working on issues, reviewing MRs, and participating in technical
discussions. The secondary owner can also act as the primary owner when the
DRI is OOO, depending on their bandwidth.
Supporting contributors are highly recommended but optional. There can also be
multiple seconary owners for a project.
Not everyone needs to be a DRI, but everyone should be a supporting contributor
on at least one project.
The structure of having both DRIs and supporting contributors does not introduce
any hard requirements for moving MRs forward, as reviews and approvals can be
done by anyone on the team.
Technical Roadmap, Customer Issues, and Cross Functional Issues
The Gitaly Technical Roadmap & Customer Issues
board contain one-off issues that are not a part of any projects, but are important
issues to address. These include technical roadmap
issues, customer issues, and cross functional work in Gitaly that other teams
rely on. These issues will be sorted by priority. Team members can pick up work
from this board in addition to issues they are working on as part of project
epics.
As a rule of thumb, the ratio of project work to technical
roadmap, customer issues should be roughly 70/30.
Urgent and high priority issues
P1/S1 issues should be treated with urgency. If such issues have not been
scheduled, bias for action is encouraged.
Go ahead and pull them into the current milestone, but do notify the EM and PM.
Blocked issues
If your work is blocked, use workflow::blocked and set a blocking issue for
clarity. Then consider asking for help and/or helping to unblock another team
member’s blocked work before picking up something else.
Issues blocked for a long time should be removed from this process by removing the milestone and unassigning.
Adding more work for the team
Everyone can file new issues as more work is discovered, and feed them into this
process. To do so, file an issue, tag EM and PM, and assign workflow::planning breakdown without a milestone. Please explain both what needs to be done and
why (ie the impact and urgency), and make it clear whether the work is ready
to be picked up. (This is also how project DRIs add the next steps in their
projects to the workflow.)
Meta
A weekly call is held between the product manager and engineering managers (of
both Gitaly and Git teams). Everyone is welcome to join and these calls are
used to discuss any roadblocks, concerns, status updates, deliverables, or other
thoughts that impact the group.
Roadmap planning
The current roadmap is this epic board. It consists of themes/projects running for a quarter or longer (in some cases, much longer). It is okay to add sub-projects directly to the roadmap in the latter case.
Anyone can propose a project: file an epic and discuss with the team (and EM+PM). Don’t forget the group::gitaly label.
Once accepted, we add the Roadmap label.
Ongoing roadmap items get roadmap::now, while roadmap::next and roadmap::later show what’s been triaged and pushed into the future for now.
At each quarterly planning:
we review roadmap items (using arguments from the vision and principles, current business priorities etc)
and then take on OKRs that push those goals forward.
Quarterly Planning
Quarterly planning is done before every quarter for the next 3 milestones, with
input from everyone. At that time, we must already have a good idea of the work
that needs to be done.
The process is as follows:
EM+PM (with input from engineers and stakeholders): decide the scope we’ll be
working on, which will align with department level OKRs.
EM+PM+Engineers: Based on roadmap items, file smaller epics/issues if needed
that can be completed in 3 milestones (ie one quarter). Tie them to the
overall project epics. This is where we’ll track the actual work.
PM: Once the scope of the quarter is clear, take the list of issues and
assign one of the three milestones, along with workflow::planning breakdown (for large issues in need of breakdown) or workflow::ready for development.
Engineers: help break down workflow::planning breakdown items and file
smaller issues if needed, adding them to the same 3 milestones as reasonable.
Raise exceptions as needed.
Handling issues with strict SLO
Issues with Infradev label are typically Corrective Actions or other followup items that have strict
SLO tracking. They will be scheduled through either of the above paths, by EM
and/or PM polling these dashboards:
EM+PM: Poll the dashboards at least weekly. Triage and schedule these issues so that SLOs can be met. If needed, move the issue to the Gitaly tracker, or file a proxy issue there so that it shows up on work boards, and mark it as blocking. Drag issues to the top of the workflowready for development column.
EM+PM: If the issue is blocked or depends on ongoing work, add a Milestone that fits the SLO and the pending work (so that we don’t forget it). Ensure that blocking work gets scheduled before.
Engineers: please prioritize picking up this work, and post frequent (at most weekly, even if no changes) updates in the original issue. Mark any blocking issues as such.
Gitaly consumers
To have a constant communication flow about planned changes, updates and maybe
breaking changes we have the #g_gitaly Slack channel. In the
channel we will provide updates for all teams using the service but also ask
for assistance to provide feedback and insights about planned changes or improvements.
To support this pro-active communication additionally there is also an individual
counterpart on the consumer side to help with research in the codebases and
coordination with all the teams consuming Gitaly. The DRI on Consumer side is Igor Drozdov.
Gitaly also offers many non-customer facing features, which are used by GitLab and other customers who directly interface with Gitaly. These Gitaly level deprecations will not be announced using the above methods as they are not designed for GitLab end users to interface with directly. Some examples of these non-customer facing features are storage level APIs, which should never be called by GitLab users.
This document is intended for Gitaly engineers, to become familiar with GitLab’s production layout and gain the ability to effectively debug production problems. While the focus is on SaaS, many of the skills transfer also to debugging self-managed instances.
Generic GitLab background
Skim / read the following, focusing on an overview then on Gitaly:
When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.
Cookie Policy
User ID: 0a21de7b-c200-419b-ab63-7f2013fceb23
This User ID will be used as a unique identifier while storing and accessing your preferences for future.
Timestamp: --
Strictly Necessary Cookies
Always Active
These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, enabling you to securely log into the site, filling in forms, or using the customer checkout. GitLab processes any personal data collected through these cookies on the basis of our legitimate interest.
Functionality Cookies
These cookies enable helpful but non-essential website functions that improve your website experience. By recognizing you when you return to our website, they may, for example, allow us to personalize our content for you or remember your preferences. If you do not allow these cookies then some or all of these services may not function properly. GitLab processes any personal data collected through these cookies on the basis of your consent
Performance and Analytics Cookies
These cookies allow us and our third-party service providers to recognize and count the number of visitors on our websites and to see how visitors move around our websites when they are using it. This helps us improve our products and ensures that users can easily find what they need on our websites. These cookies usually generate aggregate statistics that are not associated with an individual. To the extent any personal data is collected through these cookies, GitLab processes that data on the basis of your consent.
Targeting and Advertising Cookies
These cookies enable different advertising related functions. They may allow us to record information about your visit to our websites, such as pages visited, links followed, and videos viewed so we can make our websites and the advertising displayed on it more relevant to your interests. They may be set through our website by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other websites. GitLab processes any personal data collected through these cookies on the basis of your consent.