Engineering
The GitLab Product team looks ahead for expanding the platform “What" (features) and “Why” (product strategy) and Engineering determines the “How” (technical implementation) and “When” (scheduling) of the platform releases. The content on this page talks about how we do engineering at GitLab.
Engineering Direction
GitLab has a Three-Year Strategy, and we’re excited to see every member of the Engineering division contribute to achieving it. Whether you’re creating something new or improving something that already exists, we want you to feel empowered to bring your best ideas for influencing the product direction through improved scalability, usability, resilience, and system architectures. And when you feel like you need to expand your knowledge in a particular area, know that you’re supported in having the resources to learn and improve your skills.
Our focus is to make sure that GitLab is enterprise grade in all its abilities and to support the AI efforts required to successfully launch AI features to General Availability.
Making sure that GitLab is enterprise grade involves several teams collaborating on improving our disaster recovery and support offerings through ongoing work with GitLab Dedicated and Cells infrastructure. Our goal here is improved availability and service recovery.
Engineering Culture
Engineering culture at GitLab encompasses the processes, workflows, principles
and priorities that all stem from our GitLab Values.
All these things continuously strengthen our engineering craftsmanship and
allow engineers to achieve engineering excellence, while growing and having a
significant, positive impact on the product, people, and the company as a whole.
Our engineering culture is primarily being carried and evolves through
knowledge sharing and collaboration. Everyone can be part of this process
because at GitLab everyone can contribute.
Engineering Excellence
Engineering excellence can be defined as an intrinsic motivation to improve
engineering efficiency, software quality, and deliver better results while
building software products. Engineering excellence is being fueled by a strong
engineering culture combined with a mission: to build better software that
allows everyone to contribute.
Engineering Initiatives
Engineering is the primary advocate for the performance, availability, and security of the GitLab project. Product Management prioritizes 60% of engineering time, so everyone in the engineering function should participate in the Product Management prioritization process to ensure that our project stays ahead in these areas. Engineering prioritizes 40% of time on initiatives that improve the product, underlying platform, and foundational technologies we use.
Work in the 40% time budget should be coordinated and prioritized by the Engineering Manager of a team. Use the label Engineering Time
for issues and MRs that are done as part of it so we can follow the work and the results across the engineering division.
- Contributing to broad engineering initiatives and participating in working group-related tasks.
- Review fixes from our support team. These merge requests are tagged with the
Support Team Contributions
label. You can filter on open MRs.
- Working on high priority issues as a result of issue triaging. This is our commitment to the community and we need to include some capacity to review MRs or work on defects raised by the community.
- Improvements to the performance, stability and scalability of a feature or dependency including underlying infrastructure. Again, the Product team should be involved in the definition of these issues but Engineering may lead here by planning, prioritizing, and coordinating the recommended improvements.
- Improvements and upgrades to our toolchain in order to boost efficiency.
- Codebase improvements: Removing technical debt, updating or replacing outdated dependencies, and enhancing logging and monitoring capabilities.
- Constructing proof-of-concept models for thorough exploration of new technologies, enhancements and new possibilites.
- Work on improvements and feature enhancements to the product, in the sense of internal community contributions, that would increase our internal engineering productivity by focusing on ready-to-go items that are currently assigned a low priority in the backlog.
Technical Roadmaps
Some of the above examples for the 40% time budget can help in forming a long-term technical roadmap for your group, and determine how best to prioritize your technical work to support overall business goals. In addition to the examples above:
- Ask yourself these questions
- What are your most frequent sources of delays? (Could be long-standing tech debt you have to work past while developing, could be lack of reviewers for your domain, could be external to your team like with pipeline duration)
- Do you have any consistently similar bugs or security issues that come in due to a certain area?
- Has your team been talking about potentially refactoring any areas?
- Is your team struggling with certain processes?
- Have you had recent incidents that allude to a larger problem?
- Are you getting frequent requests for help in some area?
- Is your team frequently missing their deliverable commitments? What would help?
- Does your area have performance (slow endpoints, inconsistent responses, intermittent errors) or scalability (the feature or area as-is will not scale) concerns?
- Where do you see the biggest instability? Have you talked to operations and support about feedback for your area?
- Do you have application or rate limits in the right places?
- Have you burned down your security, corrective action, and infradev issues?
- Is your error budget green?
- Have your feature flags been removed from the codebase yet?
- Do you have adequate unit test, integration test and E2E coverage?
- Do you have adequate documentation for your features?
- Do you have adequate telemetry , logging, monitoring of your features?
- Do you have adequate error handling and error codes that allows fast and easy diagnostics?
- Gather data like this
- Master:Broken issues
- ~“severity::1” and ~“severity::2” bugs
- Missed-Slo issues
- Flaky test issues
- ~“type::maintenance” issues
- Think about the future state of your product
- Where do you want your product to be this time next year?
- What are the technical requirements to achieve that?
- What are technical topics that would benefit from research/POCs?
- What would make it easier for you to achieve that if it was no longer a factor?
- What would be the performance and/or business impact once you address these issues?
- How would you evolve your team processes to regularly review your technical roadmap?
Technical roadmap process
Engineering Managers (EMs) are responsible for collaboratively developing their team’s technical roadmap backlog. All items should be documented as epics and issues using the “Technical Roadmap” label.
Global initiatives will be defined and must be incorporated into each group’s roadmap and prioritization (e.g., allocating 40% of front-end capacity for Vue upgrade, completing all Cells issues for a specific area by milestone XYZ).
Prioritization of items should align with:
- General business goals
- Engineering vision
- Team capacity and expertise
Planning Guidelines:
- Allocate 40% of the overall time budget for technical roadmap items in the normal milestone planning process.
- Use the “Technical roadmap” label for all related issues to facilitate tracking and coordination.
Key Steps:
- Identify and document technical debt and improvement opportunities
- Assess impact and effort for each item
- Prioritize based on business value and strategic alignment
- Integrate with existing iteration/milestone planning
- Regularly review and adjust the roadmap
This process ensures a balanced approach between feature development and technical improvements, promoting long-term sustainability and efficiency of the engineering organization.
We have a 3-year goal of reaching 1,000 monthly contributors as a way to mature new stages, add customer-desired features that aren’t on our roadmap, and even translate our product into multiple languages.
Diversity
Diverse teams perform better. They provide a sense of belonging that leads to higher levels of trust, better decision making, and a larger talent pool. They also focus more on facts, process facts more carefully, and are more innovative. By hiring globally and increasing the numbers of women and under represented groups (URGs) in the Engineering division, we’re helping everyone bring their best selves to work.
Growing our team
Strategic hiring is a top priority, and we’re excited to continue hiring people who are passionate about our product and have the skills to make it the best DevSecOps tool in the market. Our current focus areas include reducing the amount of time between offer and start dates and hiring a diverse team (see above). We’re also implementing industry-standard approaches like structured, behavioral, and situational interviewing to help ensure a consistent interview process that helps to identify the best candidate for every role. We’re excited to have a recruiting org to partner with as we balance the time that managers spend recruiting against the time they spend investing in their current team members.
Expand customer focus through depth and stability
As expected, a large part of our focus is on improving our product.
For Enterprise customers, we’re refining our product to meet the levels of security and reliability that customers rightfully demand from SaaS platforms (SaaS Reliability). We’re also providing more robust utilization metrics to help them discover features relevant to their own DevOps transformations (Usage Reporting) and offering the ability to purchase and manage licenses without spending time contacting Sales or Support (E-Commerce and Cloud Licensing). Lastly, in response to Enterprise customer requests, we’re adding features to support Suggested Reviewers, better portfolio management through Work Items, and Audit Events that provide additional visibility into user passive actions.
For Free Users, we’re becoming more efficient with our open core offering, so that we can continue to support and give back to students, startups, educational institutions, open source projects, GitLab contributors, and nonprofits.
For Federal Agencies, we’re obtaining FedRAMP certification to strengthen confidence in the security standards required on our SaaS offering. This is a mandated prerequisite for United States federal agencies to use our product.
For Hosted Customers, we’re supporting feature parity between Self-Managed and GitLab Hosted environments through the Workspace initiative. We’re also launching GitLab Dedicated for customers who want the flexibility of cloud with the security and performance of a single-tenant environment.
For customers using CI/CD, we’re expanding the available types of Runners to include macOS, Linux/Docker, and Windows, and we’re autoscaling build agents.
Engineering Departments
There are five departments within the Engineering Division:
Other Related Pages
Workflows
GitLab in Production
People Management
Cross-Functional Prioritization
See the Cross-Functional Prioritization page for more information.
SaaS Availability Weekly Standup
To maintain high availability, Engineering runs a weekly SaaS Availability standup to:
- Review high severity (S1/S2) public facing incidents
- Review important SaaS metrics
- Track progress of Corrective Actions
- Track progress of Feature Change Locks
Infrastructure Items
Each week the Infrastructure team reports on incidents and key metrics. Updating these items at the top of the
Engineering Allocation Meeting Agenda
is the responsibility of the Engineering Manager for the General Squad in Reliability.
- Incident Review
- Include any S1 incidents that have occurred since the previous meeting.
- Include any incidents that required a status page update.
- SaaS Metrics Review
- Include screenshots of the following graphs in the agenda.
Development Items
For the core and expansion development departments, updates on current status of:
- Error budgets
- Reliability issues (infradev)
- Security issues
Groups under Feature Change Locks should update progress synchronously or asynchronously in the weekly agenda.
The intention of this meeting is to communicate progress and to evaluate and prioritise escalations from infrastructure.
Feature Change Locks progress reports should appear in the following format in the weekly agenda:
FCL xxxx - [team name]
- FCL planning issue:
<issue link>
- Incident Issue:
<issue link>
- Incident Review Issue:
<issue link>
- Incident Timeline:
<link to Timeline tab of the Incident issue>
- e.g. time to detection, time to initiate/complete rollback (as applicable), time to mitigation
- Cause of Incident
- Mitigation
- Status of Planned/completed work associated with FCL
Feature Change Locks
A Feature Change Lock (FCL) is a process to improve the reliability and availability of GitLab.com. We will enact an FCL anytime there is an S1 or public-facing (status page) S2 incident on GitLab.com (including the License App, CustomersDot, and Versions) determined to be caused by an engineering department change. The team involved should be determined by the author, their line manager, and that manager’s other direct reports.
If the incident meets the above criteria, then the manager of the team is responsible for:
- Form the group of engineers working under the FCL. By default, it will be the whole team, but it could be a reduced group if there is not enough work for everyone.
- Plan and execute the FCL.
- Inform their manager (e.g. Senior Manager / Director) that the team will focus efforts towards an FCL.
- Provides updates at the SaaS Availability Weekly Standup.
If the team believes there does not need to be an FCL, approval must be obtained from either the VP of Infrastructure or VP of Development.
Direct reports involved in an active borrow should be included if they were involved in the authorship or review of the change.
The purpose is to foster a sense of ownership and accountability amongst our teams, but this should not challenge our no-blame culture.
Timeline
Rough guidance on timeline is provided here to set expectations and urgency for an FCL. We want to balance moving urgently with doing thoughtful important work to improve reliability. Note that as times shift we can adjust accordingly. The DRI of an FCL should pull in the timeline where possible.
The following bulleted list provides a suggested timeline starting from incident to completion of the FCL. “Business day x” in this case refers to the x business day after the incident.
- Day 0: Incident:
- Business day 1: relevant Engineering Director collaborates with VP of Development and/or VP of Infrastructure or their designee to establish if FCL is required.
- Business day 2: confirmation that an FCL is required for this incident and start planning.
- Business days 3-4: planning time
- Business days 5-9 (1 week): complete planned work
- Business days 10-11: closing ceremony, retrospective and report back to standup
Activities
During the FCL, the team(s) exclusive focus is around reliability work, and any feature type of work in-flight has to be paused or re-assigned. Maintainer duties can still be done during this period and should keep other teams moving forward. Explicitly higher priority work such as security and data loss prevention should continue as well. The team(s) must:
- Create a public slack channel called
#fcl-incident-[number]
, with members
- The Team’s Manager
- The Author and their teammates
- The Product Manager, the stage’s Product leader, and the section’s Product leader
- All reviewer(s)
- All maintainers(s)
- Infrastructure Stable counterpart
- The chain-of-command from the manager to the VP (Sr Manager, Sr/Director, VP, etc)
- Create an FCL issue in the FCL Project with the information below in the description:
- Name the issue:
[Group Name] FCL for Incident ####
- Links to the incident, original change, and slack channel
- FCL Timeline
- List of work items
- Complete the written Incident Review documentation within the Incident Issue as the first priority after the incident is resolved. The Incident Review must include completing all fields in the Incident Review section of the incident issue (see incident issue template). The incident issue should serve as the single source of truth for this information, unless a linked confidential issue is required. Completing it should create a common understanding of the problem space and set a shared direction for the work that needs to be completed.
- See that not only all procedures were followed but also how improvements to procedures could have prevented it
- A work plan referencing all the Issues, Epics, and/or involved MRs must be created and used to identify the scope of work for the FCL. The work plan itself should be an Issue or Epic.
- Daily - add an update comment in your FCL issue or epic using the template:
- Exec-level summary
- Target End Date
- Highlights/lowlights
- Add an agenda item in the SaaS Availability weekly standup and summarize status each week that the FCL remains open.
- Hold a synchronous
closing ceremony
upon completing the FCL to review the retrospectives and celebrate the learnings.
- All FCL stakeholders and participants shall attend or participate async. Managers of the groups participating in the FCL, including Sr. EMs and Directors should be invited.
- Agenda includes reviewing FCL retrospective notes and sharing learnings about improving code change quality and reducing risk of availability.
- Outcome includes handbook and GitLab Docs updates where applicable.
Scope of work during FCL
After the Incident Review is completed, the team(s) focus is on preventing similar problems from recurring and improving detection. This should include, but is not limited to:
- Address immediate corrective actions to prevent incident reoccurrence in the short term
- Introduce changes to reduce incident detection time (improve collected metrics, service level monitoring, which users are impacted)
- Introduce changes to reduce mitigation time (improve rollout process through feature flags, and clean rollbacks)
- Ensure that the incident is reproducible in environments outside of production (Detect issues in staging, increase end-to-end integration test coverage)
- Improve development test coverage to detect problems (Harden unit testing, make it simpler to detect problems during reviews)
- Create issues with general process improvements or asks for other teams
Examples of this work include, but are not limited to:
- Fixing items from the Incident Review which are identified as causal or contributing to the incident.
- Improving observability
- Improving unit test coverage
- Adding integration tests
- Improving service level monitoring
- Improving symmetry of pre-production environments
- Improving the GitLab Performance Tool
- Adding mock data to tests or environments
- Making process improvements
- Populating their backlog with further reliability work
- Security work
- Improve communication and workflows with other teams or counterparts
Any work for the specific team kicked off during this period must be completed, even if it takes longer than the duration of the FCL. Any work directly related to the incident should be kicked off and completed even if the FCL is over. Work paused due to the FCL should be the priority to resume after the FCL is over. Items created for other teams or on a global level don’t affect the end of the FCL.
A stable counterpart from Infrastructure will be available to review and consult on the work plan for Development Department FCLs. Infrastructure FCLs will be evaluated by an Infrastructure Director.
The Product Analytics team is responsible for maintaining Engineering Performance Indicators. Work regarding KPI / RPI is tracked using the Product Analytics task intake tracker.
Manual verification
We manually verify that our code works as expected.
Automated test coverage is essential,
but manual verification provides a higher level of confidence that features behave as intended and bugs are fixed.
We manually verify issues when they are in the workflow::verification
state.
Generally, after you have manually verified something, you can close the associated issue.
See the Product Development Flow to learn more about this issue state.
We manually verify in the staging environment whenever possible.
In certain cases we may need to manually verify in the production environment.
If you need to test features that are built for GitLab Ultimate then you can get added to the issue-reproduce
group on production and staging environments by asking in the #development Slack channel.
These groups are on an Ultimate plan.
Critical Customer Escalations
We follow the below process when existing critical customer escalations
requires immediate scheduling of bug fixes or development effort.
Requirements for critical escalation
- Customer is in critical escalation state
- The issues escalated have critical business impact to the customer, determined by Customer Success and Support Engineering leadership
- Failure to expedite scheduling may have cascading business impact to GitLab
- Approval from a VP from Customer Success AND a Director of Support Engineering are required to expedite scheduling
Process
- The issue priority is set to
~"priority::1"
regardless of severity
- The label
~"critical-customer-escalation"
is applied to the issue
- The issue is scheduled within 1 business day
- For issues of type features, approval from the Product DRI is needed.
- The DRI or their delegate provides daily process updates in the escalated customer slack channel
DRI
- If issue is type bug DRI is the Director of Development
- If issue is type feature DRI is the Director of Product
- If issue requires Infrastructure work the DRI is the Engineering Manager in Infrastructure
The DRI can use the customer critical merge requests process to expedite code review & merge.
Pairing Engineers on priority::1/severity::1 Issues
In most cases, a single engineer and maintainer review are adequate to handle a priority::1/severity::1 issue. However, some issues are highly difficult or complicated. Engineers should treat these issues with a high sense of urgency. For a complicated priority::1/severity::1 issue, multiple engineers should be assigned based on the level of complexity. The issue description should include the team member and their responsibilities.
Team Member |
Responsibility |
Team Member 1 |
Reproduce the Problem |
Team Member 2 |
Audit Code Base for other places where this may occur |
If we have cases where three or five or X people are needed, Engineering Managers should feel the freedom to execute on a plan quickly.
Following this procedure will:
- Decrease the time it takes to resolve priority::1/severity::1 issues
- Allow for a smooth handover of the issue in case of OOO or End of the Work Day
- Provide support for Engineers if they are stuck on a problem
- Provide another set of eyes on topics with high urgency or securing security-related fixes
Internal Engineering handbook
There are some engineering handbook topics that are internal only. These topics can be viewed by GitLab team members in the engineering section of the internal handbook.
Complexity at Scale
As GitLab grows, through the introduction of new features and improvements on
existing ones, so does its complexity. This effect is compounded by the
care and feeding of a single codebase that supports the wide variety of
environments in which it runs, from small self-managed instances to large
installations such as GitLab.com. The company itself adds to this complexity
from an organizational perspective: hundreds employees worldwide contribute in
one way or another to both the product and the company, using GitLab.com on a
daily basis to do their job. Teams members in Engineering are directly
responsible for the codebase and its operation, for the infrastructure powering
GitLab.com, and for the support of customers running self-managed instances.
Likewise, team members in the Product organization chart the future of the
product.
Vision
Our goal is not merely to launch features, but to ensure they land successfully and provide real value to our customers. We strive to develop a best-in-class product that exceeds expectations across all user groups by meeting high-quality standards while ensuring reliability and maintaining an ease of operation and scalability to meet diverse customer needs. All team members should remain mindful of our target customers and the multiple platforms we support in everything we do.
Overview
The Cross-Functional Prioritization framework exists to give everyone a voice within the product development quad (PM, Development, Quality, and UX). By doing this, we are able to achieve and maintain an optimal balance of new features, security fixes, availability work, performance improvements, bug fixes, technical debt, etc. while providing transparency into prioritization and work status to internal and external stakeholders so they can advocate for their work items. Through this framework, team members will be able to drive conversations about what’s best for their quad and ensure there is alignment within each milestone.
The CTO Leadership Team is composed of the CTO’s direct reports and the Office of the CTO (OCTO).
Office of the CTO (OCTO)
The OCTO is composed of the CTO, the Engineering EBAs, the CTO’s People Business Partners, and the CTO’s Director of Strategy and Operations. This team works to amplify the CTO’s reach, vision, and mission. They work together to deliver programs and results across the entire Engineering Division.
Overview and terminology
This page describes the deployment and release approach used to deliver changes to users. The overall process consists of two significant parts:
- Monthly self-managed release: GitLab version (XX.YY.0) published every month. From this monthly release, planned patches are scheduled twice a month and unplanned critical patches are created as needed.
- GitLab.com deployment: A Continous Delivery process to deploy branches created from master branch, on regular intervals.
For more details on the individual processes and how to use them please see the Deployments page for GitLab.com changes and the Releases page for changes for self-managed users.
Awesome! You're about to become a GitLab developer! Here you'll find everything you need to start developing.
The Three Components of Career Development
There are three important components of developing one’s career:
Structure
Team members who are (or want to be) on track for promotion should be engaged in
a career coaching conversation with their manager. Some basic information about
this process can be found in the People Ops handbook.
Specific coaching plan templates are listed here to help start the conversation:
We want to build these documents around the career matrix for Engineering. Since this career
matrix is still being developed, these documents are currently based on the job family requirements.
Communication
GitLab Engineering values clear, concise, transparent, asynchronous, and frequent communication. Here are our most important modes of communication:
As part of a fully-distributed organization such as GitLab, it is important to stay informed about engineering-led initiatives.
We employ multimodal communication, which describes the minimum set of communication channels we’ll broadcast to.
The Engineering Divison has a Google Group, engineering@gitlab.com
(internal only), that all members of the division should become members as part of the onboarding process. If this is not the case for you, reach out to your manager. As GitLab, the company, primarily communicates via Slack, use this list mainly for Access Control to Google Drive/Docs/Sheets/Slides.
Occasionally, it is useful to set up a demo on a regular cadence to ensure cross-functional iterative alignment.
This is helpful for high-impact deliverables that require integration across multiple functional teams. This is in-line with the seventh principle of the Agile Manifesto: “Working Software is the best measure of progress”.
Demo script
For multi-person groups or critical projects, we use a heavier weight grading process:
- The demo owner identifies the outcome of the demo based on the business criteria. This can be an engineering manager, a product manager or someone who is a business stakeholder of the outcome.
- The demo owner breaks down the outcome into smaller pieces, aligning with functional areas (tracks) and structured in procedural flow. This will later be captured as demo steps.
- List each step, however small it might look to expose implicit dependencies.
- The demo owner identifies a functional team leader as a DRI for each demo track. The DRI for each track is responsible for demoing each track to completion.
- The demo owner collaborates with functional team leaders to populate the demo steps in a scorecard. Here is the demo scorecard template. To use this template:
- Copy the template and rename to the initative/deliverable.
- Clear the scores in the scorecard sheet.
- Populate the demo tracks and demo steps.
- Note: Here is an example of a populated demo scorecard.
- The demo owner identifies a demo grader to hold grading accountability. This can be the demo owner or someone who is familiar with the product domain and customers’ usecase. It is important that the demo grader is someone who can advocate for the success of our end users.
Demo scheduling
- Once the script is finalized, the demo owner schedules a recurring recorded meeting for the demo with target end date.
- Demo owner & demo grader must be present in every demo to ensure accoutablility. Assign delegates appropriately for one-off un-avaliability.
- Create an agenda document where each participant can take notes in, in addition to the scorecard.
- The audience is the key business stakeholder of the demo deliverables & the product group team (Development, UX, Quality, Product).
- Meeting should be kept to 30 minutes. The emphasis should be on the product requirements & acceptance criteria.
- The demo gets kicked off and each demo tracks iterate each week on the progress until completion.
- Live streaming or uploading to GitLab Unfiltered channel is optional. Please abide by our SAFE guidelines if choosen to do so.
Demo grading
The demo master grades each step during the demo meeting. To make it less subjective, we use a scale that is widely understood and communicated.
Our scoring definitions are as follows:
The error budget provides a clear, objective metric that determines how unreliable the service is allowed to be within a single quarter.
GitLab engineers: work with an Engineering Fellow for a week
Executive Summary
Engineering Handbook MR Rate
The handbook is essential to working remote successfully, to keeping
up our transparency, and to recruiting successfully. Our processes are constantly
evolving and we need a way to make sure the handbook is being updated at a regular
cadence. This is measured by Merge Requests that update the handbook contents relate to the Engineering Division overtime.
Overview
Hiring is a cornerstone of success for our engineering organization, contributing to our growth and our ability to drive results for our customers. As such, it’s not just a responsibility but fundamental to every engineer’s contribution to GitLab. It should be deeply ingrained in every engineer’s role at GitLab, regardless of their seniority.
By actively participating in recruitment efforts, engineers help shape their team culture, elevate technical standards, and ensure a continuous influx of diverse perspectives and skillsets. Contributing to hiring efforts allows GitLab to grow responsibly and affects our collective success within Engineering.
Engineering IC Leadership at GitLab: going beyond Senior level
At GitLab, it is expected that everyone is a manager of one. For Individual Contributors (IC) a new type of challenge begins with the Staff Engineer role. Engineering IC Leadership is an alternative career path to Engineering Management.
Just like moving into management, also moving from Senior to Staff changes the day-to-day work and expectations placed on ICs.
Engineering IC Leaders exert technical leverage in their scope of influence.
Like any other leadership role, the focus should be on helping others to improve.
Their impact multiplies with every person they help grow, and the company gets more value when they’re not investing time in doing things themselves.
How Engineering Management Works at GitLab
At GitLab, we promote two paths for leadership in Engineering. While there is a
healthy degree of overlap between these two ideas, it is helpful and efficient
for us to specialize training and responsibility for each of:
While technical leadership tends to come naturally to software engineers,
professional leadership can be more difficult to master.
Mentorship, Coaching and Engineering Programs
Line Managers and Senior Individual Contributors
The PlatoHQ Program has a total of 10 Engineering Managers/Senior IC’s participating. The program exists of both self-learning via an online portal and 1-1 sessions with a mentor.
Senior Leaders in Engineering
The 7CTOs Program is run with 4 Senior leaders in Engineering. The program exists of peer mentoring sessions (forums) and effective network building.
AI Gateway
AI Gateway for GitLab Duo features.
Learn about GitLab's secondment program for external engineers.
This document explains the workflow for anyone working with issues in GitLab Inc.
Vision
Scale and develop our diverse, global team to drive results that support our product and customer growth, while maintaining our values and unique way of working.
Mission
GitLab’s unique way of working asynchronously, handbook first, using the product we develop, and with clear focus on our values enables very high productivity. In delivering on growth, we maintain our values and ways of working while developing team members and increasing the diversity of our team. We focus on constantly improving usability and reliability of our product to reach maximum customer satisfaction. Community contributions and customer interactions rely on efficient and effective communication. We are a data-driven, customer experience first, open core organization delivering one secure, reliable, world leading DevOps platform.
A Fast Boot is an event that gathers the members of a team or group in one
physical location to work together and bond in order to accelerate the
formation of the team or group so that they reach maximum productivity as
early as possible.
History of the Fast Boot
- The first Fast Boot took place in December 2018. The 13 members of Monitor
Group gathered for 3 days to work and bond in Berlin. You can learn more
by reading the original planning issue.
- The second Fast Boot took place in April 2019. The 5 members of Delivery team
gathered in Utrecht to bond but also work on finalising auto-deployment process.
Planning issue contains
the proposal for Fast boot, and
the Delivery Fast boot epic
contains issues and links to recordings created during the Fast Boot.
- The third Fast Boot took place in Vancouver in September 2019. It included 18 people from Product, Engineering, UX and Data from the Acquisition, Conversion, Expansion and Retention teams. The planning issue contains the proposal for Fast Boot, and outcomes are available in the Growth Fast Boot Page.
Why should you have a Fast Boot?
Right now, the fast boot is intended for new teams or for teams with a majority
of new members who need to build their culture of shipping work. If your team
fits this description, you can propose holding a Fast Boot to reduce ramp up
time and establish and strengthen relationships between team members.
Teams
Frontend domain experts
You can find engineers with expertise in various frontend domains on the engineering
projects page under the following sections:
You can reach out to these experts to get help on:
- discussing and defining the architecture of complex frontend features.
- frontend technical topics like Vue, GraphQL, CSS, testing, tooling, etc.
- proposing changes to the cross-domain frontend architecture via an RFC.
- Questions about the frontend for a product area like design management, merge requests, pipelines, etc.
Frontend group calls
The frontend group has scheduled weekly calls every Tuesday. Since 2021-06-01, these occur at three staggered, time zone friendly times, repeating every three weeks. During these calls, team members are encouraged to share
information that may be relevant to share with other members synchronously (Eg. new documentation change, new breaking changes added to master
).
Background
As part of the FY25-Q2 Engagement Survey Results & Action Planning, we identified Team Member Development & Engagement as being an area to focus on. One of the actions we took was to identify a way to provide Engineering get-togethers for increased sense of belonging
After looking at different possibilities based on budget we were able to provide a subsidy in FY25 to facilitate these get-togethers, both in an in-person format as well as virtually.
Program Overview
GitLab has partnered with Plato HQ for an external Mentoring Program. In this program GitLab team members select Mentors external to GitLab. Some of the other Mentoring programs we have here at GitLab are internal to GitLab. Minorities in Tech and Women in Sales are both made up of GitLab Mentors and GitLab Mentees. The external mentoring is what makes this approach to GitLab unique.
For more information on mentoring best practice, visit Mentoring.
GitLab consists of many subprojects. A curated list of GitLab projects can be found at the GitLab Engineering projects page.
Creating a new project
When creating a new project, please follow these steps:
-
Read and familiarize yourself with our stance on Dogfooding. Be aware that as part of a product development organization that builds a tool for people like us, that our default is to add features and tooling to the GitLab project. This is still true when the effort to do so is 2-5x. Despite this, if you still feel you need to create a project outside of GitLab, you must follow this process to document the decision
Guidelines for automation with project/group tokens or service accounts
Definition of an Incident
The definition of “incident” can vary widely among companies and industries. Here at GitLab, incidents are anomalous conditions that result in — or may lead to — service degradation, outages, or other disruptions. These events require human intervention to avert disruptions, communicate status, restore normal service, and identify future improvements.
Incidents are always given immediate attention.
Incident Management
Incident Management is the process of responding to, mitigating, and documenting an incident. At GitLab, we approach Incident Management as a feedback loop with the following steps, with different teams adjusting them as needed:
The Infrastructure Department is responsible for the availability, reliability, performance, and scalability of GitLab.com and other supporting services
Vision
Our vision is to be a world-class Infrastructure & Tools department that enables GitLab to meet & exceed our customers’ needs.
We:
- Build critical infrastructure, metrics & tools that enable GitLab Engineering & Product teams to do their best work efficiently and ship high-quality & reliable products to our customers.
- Are customer focused. We have an ambitious drive to attain high availability & reliability for SaaS platforms and self-managed customers.
- Provide and maintain best practice tools and methodologies that create a platform for engineering teams to do their work productively.
- Enable GitLab Engineering & Product teams to run services effectively using our tools, to meet business needs & SLOs.
Direction
Direction is set within the Infrastructure, and the Quality direction pages. With the ongoing consolidation of the departments, separate direction pages will become obsolete.
The Infrastructure Platforms department is responsible for the availability, reliability, performance, and scalability of GitLab.com and other supporting services
R&D OKR Overview
This page provides an overview of the joint R&D OKR workflow. All departments within R&D, which includes the Product and Engineering Divisions, collaborate by following this guidance. For clarifications on the OKR process, team members can post in Slack #product or #engineering-fyi.
Timeline and process for OKRs
The OKR process is designed to tie in to the overall OKR process the company uses. That process is driven largely off of the date of the Key Review meetings, so the Product process keys off of that date as well. Dates will not necessarily align with the start of a fiscal quarter as a result.
The calculation methodology for GitLab.com Service Availability definition is in the monitoring policy.
More details on definitions of outage, and degradation are on the incident-management page
Historical Service Availability
Year Month |
Availability |
Comments |
2024 November |
100.00% |
|
2024 October |
99.66% |
|
2024 September |
99.85% |
|
2024 August |
100.00% |
|
2024 July |
99.99% |
|
2024 June |
99.99% |
|
2024 May |
100.00% |
|
2024 April |
99.96% |
|
2024 March |
100% |
|
2024 February |
99.86% |
|
2024 January |
100% |
|
2023 December |
99.99% |
|
2023 November |
99.99% |
|
2023 October |
99.89 |
Oct 30 Sev 1 |
2023 September |
99.98% |
|
2023 August |
100% |
|
2023 July |
99.78% |
Two severity 1 incidents contributed to ~94% of service disruption. 2023-07-07, 2023-07-14 |
2023 June |
100% |
|
2023 May |
99.92% |
|
2023 April |
99.98% |
|
2023 March |
99.99% |
|
2023 February |
99.98% |
|
2023 January |
99.80% |
|
2022 December |
100% |
|
2022 November |
99.86% |
|
2022 October |
100% |
|
2022 September |
99.98% |
|
2022 August |
99.92% |
|
2022 July |
99.95% |
|
2022 June |
99.96% |
|
2022 May |
99.99% |
|
2022 April |
99.98% |
|
2022 March |
99.91% |
|
2022 February |
99.87% |
|
2022 January |
99.95% |
|
2021 December |
99.96% |
|
2021 November |
99.71% |
|
2021 October |
99.98% |
|
2021 September |
99.85% |
|
2021 August |
99.86% |
|
2021 July |
99.78% |
|
2021 June |
99.84% |
|
2021 May |
99.85% |
does not include manual adjustment for PostgreSQL 12 Upgrade |
2021 April |
99.98% |
|
2021 March |
99.34% |
|
2021 February |
99.87% |
|
2021 January |
99.88% |
|
2020 December |
99.96% |
|
2020 November |
99.90% |
|
2020 October |
99.74% |
|
2020 September |
99.95% |
|
2020 August |
99.87% |
|
2020 July |
99.81% |
|
2020 June |
99.56% |
|
2020 May |
99.58% |
|
Related Pages
These videos provide examples of how to quickly identify failures, defects, and problems related to servers, networks, databases, security, and performance.
If you’re a GitLab team member and are looking to alert Reliability Engineering about an availability issue with GitLab.com, please find quick instructions to report an incident here:
Reporting an Incident.
If you’re a GitLab team member looking for who is currently the Engineer On Call (EOC), please see the
Who is the Current EOC? section.
Expectations for On-Call
- If you are on call, then you are expected to be available and ready to respond to PagerDuty pages as soon as possible, and within any response times set by our Service Level Agreements in the case of Customer Emergencies. If you have plans outside of your workspace during your on-call shift, this may require that you bring a laptop and reliable internet connection with you.
- We take on-call seriously. There are escalation policies in place so that if a first responder does not respond in time, another team member is alerted. Such policies are not expected to be triggered under normal operations, and are intended to cover extreme and unforeseeable circumstances.
- Because GitLab is an asynchronous workflow company, @mentions of On-Call individuals in Slack will be treated like normal messages, and no SLA for response will be associated with them.
- Provide support to the release managers in the release process.
- As noted in the main handbook, after being on-call, make sure that you take time off. Being available for issues and outages can be taxing, even if you had no pages. Resting after your on-call shift is critical for preventing burnout. Be sure to inform your team of the time you plan to take for time off.
- During on-call duties, it is the team member’s responsibility to act in compliance with local rules and regulations. If ever in doubt, please reach out to your manager and/or aligned People Business Partner.
Customer Emergency On-Call Rotation
- We do 7 days of 8-hour shifts in a follow-the-sun style, based on your location.
- After 10 minutes, if the alert has not been acknowledged, support management is alerted. After a further 5 minutes, everyone on the customer on-call rotation is alerted.
- All tickets that are raised as emergencies will receive the emergency SLA. The on-call engineer’s first action will be to determine if the situation qualifies as an emergency and work with the customer to find the best path forward.
- After 30 minutes, if the customer has not responded to our initial contact with them, let them know that the emergency ticket will be closed and that you are opening a normal priority ticket on their behalf. Also let them know that they are welcome to open a new emergency ticket if necessary.
- You can view the schedule and the escalation policy on PagerDuty. You can also opt to subscribe to your on-call schedule, which is updated daily.
- After each shift, if there was an alert / incident, the on call person will send a hand off email to the next on call explaining what happened and what’s ongoing, pointing at the right issues with the progress.
- If you need to reach the current on-call engineer and they’re not accessible on Slack (e.g. it’s a weekend, or the end of a shift), you can manually trigger a PagerDuty incident to get their attention, selecting Customer Support as the Impacted Service and assigning it to the relevant Support Engineer.
- See the GitLab Support On-Call Guide for a more
comprehensive guide to handling customer emergencies.
Infrastructure Engineer On-Call
The Infrastructure department’s SREs provide 24x7 on-call coverage for the production environment. For details, please see incident-management.
We believe in Open Source
As a company, GitLab is dedicated to open source. Not only do we believe in it, but we use it, and we give back to it. Not just through GitLab, but through contributions to other open source projects.
The purpose of this page is to document how a GitLab employee can:
- Create an open source project on behalf of GitLab
- Contribute to a third-party open source project on behalf of GitLab
- Use a third-party open source code in a GitLab’s project
Growth Strategy
As an open source project, we want to stay healthy and be open for growth, but also ready to accommodate a 10x factor of
our community. In order to achieve that, we’ve outlined a strategy that is a collaboration between multiple departments.
We categorize performance into 3 facets
- Backend
- Frontend
- Infrastructure
Backend performance is scoped to response time of API, Controllers and command line interfaces (e.g. git).
DRI: Tim Zallman, VP of Engineering, Core Development.
Performance Indicators:
Frontend performance is scoped to response time of the visible pages and UI components of GitLab.
DRI: Tim Zallman, VP of Engineering, Core Development
The handbook pages nested under “policies” directory are controlled documents, and follow a specific set of requirements to satisfy various regulatory obligations.
Avoid nesting non-controlled documentation at this location.
The Quality Department in Engineering Division
GitLab submits applications for R&D Tax Credits in a number of jurisdictions that implement reimbursement schemes for research and development. A subject-matter expert (SME) from engineering is appointed to each application to assist with data collection. A third-party tax agent prepares and submits the report. SMEs are usually Engineering Managers or Directors and located in, or with reasonable knowledge of, the jurisdiction under application.
Role of the SME
The role of the SME is twofold:
Engineering Quarterly Achievers
Quarterly, CTO Leadership will recognize Engineering team members who have excelled in a given quarter. Recognition includes:
- an invitation to the Engineering Quarterly Achievers Chat
- participation in the Engineering Quarterly Achiever’s Recognition Dinner - an expensed meal for yourself, friends and family to celebrate your work, the meal must occur before the last day of the quarter following the announcement.
Winners each quarter have until the last day of the quarter to submit for reimbursement.
Winners may submit their receipt for the meal for reimbursement via Navan.
Please see the instructions below.
In Navan, click Add Transaction Select Upload receipt (or select Type in details)
Under Expense Type field, please select “Team events & meals”
Under Classification, please select “FY25 Team Building”
Under Description field, please include this link: https://handbook.gitlab.com/handbook/engineering/recognition/#engineering-quarterly-achievers-recognition-dinner
Click Submit (or Save & close if you need to come back to add more information).
Overview and terminology
This page describes the processes used to release packages to self-managed users.
Monthly self-managed release
GitLab version (XX.YY.0) is published every month. From this monthly release, planned, and unplanned critical patch releases are created as needed.
Our maintenance policy describes in detail the cadence of our major, minor and patch releases for self-managed users. The major release yearly cadence was defined after an all stakeholder discussion.
Self-managed overview
The self-managed release
is a semver versioned package containing changes from many successful deployments on GitLab.com. Users on GitLab.com, therefore, receive features and bug fixes earlier than users of self-managed installations.
At GitLab transparency is one of our core values, as it helps create an open and honest working environment and service, which in turn accelerates growth and innovation. We treat a root cause analysis (RCA) as an opportunity to be transparent amongst our organization and community by investigating what went well and what didn’t after working on a project, incident, or issue. This page defines an RCA, the benefits of completing them, and how to complete a successful RCA here at GitLab.
Starting new teams
Our product offering is growing rapidly. Occasionally we start new teams. Backend teams should map to our product categories. Backend teams also map 1:1 to product managers.
A dedicated team needs certain skills and a minimum size to be successful. But that doesn’t block us from taking on new work. This is how we iterate our team size and structure as a feature set grows:
- Existing Team: The existing PM schedules issues for most appropriate existing engineering team
- If there is a second PM for this new feature, they work through the first PM to preserve the 1:1 interface
- Shared Manager Team: Dedicated engineer(s) are identified on existing teams and given a specialty
- The manager must do double-duty
- Their title can reflect both specialties of their engineers e.g. Engineering Manager, Distribution & Package
- Even if temporary, managing two teams is a valuable career opportunity for a manager looking to develop director-level skills
- Each specialty can have its own process, for example: Capitalized team label, Planning meetings, Standups
- New Dedicated Team:
- Engineering Manager
- Senior/Staff Engineer
- Two approved fulltime vacancies
- A dedicated PM
Team Construction
Generally engineering teams at GitLab are fullstack, they are made up of Frontend, Backend, and Fullstack individual contributors with a single Engineering Manager.
An unplanned upgrade stop is disruptive for customers as it requires to perform rollback and additional maintenance work for performing the upgrade. Unplanned stops should be treated as incidents. The process below outlines the different stages of the incident resolution process and the steps to be taken by the corresponding teams and Directly Responsible Individuals (DRIs).
High-level workflow:
- Detect unplanned upgrade stop: Identify instances of unplanned upgrade stops.
- Resolve upgrade bug: Backport the fix or update Upgrade path to include new stop.
- Perform Unplanned Upgrade Stop Root Cause Analysis: Understand why the stop occurred and prevent future incidents.
What is unplanned upgrade stop?
An unplanned upgrade stop happens when we fail to communicate the necessity of this upgrade stop in our upgrade path. For more information, read what an unplanned upgrade stop is.
Pilot Program Overview
This program allows team members at GitLab to volunteer and donate their time and technical skills (such as programming or Linux administration) to provide knowledge, support, and coaching to members of underrepresented groups (URGs) in the technology industry. The hope is we can help people who have been denied opportunity for whatever reason, and desire to get their first job in the technology industry.
This program is in pilot as of November 1, 2020. Please reach out to the contacts below if you are interested in taking part.