Infrastructure

The Infrastructure Department is responsible for the availability, reliability, performance, and scalability of GitLab.com and other supporting services

Mission

The Infrastructure Department enables GitLab (the company) to deliver a single DevOps application, and GitLab SaaS users to focus on generating value for their own businesses by ensuring that we operate an enterprise-grade SaaS platform.

The Infrastructure Department does this by focusing on availability, reliability, performance, and scalability efforts. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts.

Many other teams also contribute to the success of the SaaS platform. However, it is the responsibility of the Infrastructure Department to drive the ongoing evolution of the SaaS platform, enabled by platform observability data.

Getting Assistance

If you’re a GitLab team member and are looking to alert the Infrastructure teams about an availability issue with GitLab.com, please find quick instructions to report an incident here: Reporting an Incident.

For all other queries, please see the getting assistance page.

Vision

The Infrastructure Department operates a fast, secure, and reliable SaaS platform to which (and with which) everyone can contribute.

Integral part of this vision is to:

  1. Build a highly performant team of engineers, combining operational and software development experience to influence the best in reliable infrastructure.
  2. Work publicly in accordance with our transparency value.
  3. Use our own product to prepare, build, deliver work, and support the company strategy.
  4. Align our strategy with the industry trends, company direction, and end customer needs.

Direction

The direction is accomplished by using Objectives and Key Results (OKRs).

Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatives across both Infrastructure and Stage Groups needed to address these gaps.

We are also Product Development

Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:

As such, everyone in the department should be familiar with, and be acting upon, the following statements:

  • We should all feel comfortable contributing to the GitLab open source project
  • If we need something, our first instinct should be to get it into the open source project so it can be given back to the community
  • Try to get it in the open source project first, rather than later, even if it’s 2x harder
  • We should be using the whole product to do our jobs
  • We are all familiar with our Dogfooding process and follow it
  • We should not expect new team members to join the company with these instincts, so we should be willing to teach them
  • It is part of managers’ responsibility to teach these values and behaviors

Organization structure

(click the boxes for more details)

flowchart LR
    I[Infrastructure Platforms]
    click I "/handbook/engineering/infrastructure/"

    I --> DA[Data Access]
    click DA "/handbook/engineering/infrastructure-platforms/data-access/"
    I --> D[Dedicated]
    click D "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
    I --> DE[Developer Experience]
    click DE "/handbook/engineering/infrastructure-platforms/developer-experience/"
    I --> PE[Production Engineering]
    click PE "/handbook/engineering/infrastructure-platforms/production-engineering/"
    I --> SD[Software Delivery]
    click SD "/handbook/engineering/infrastructure/team/delivery/"
    I --> TS[Tenant Scale]
    click TS "/handbook/engineering/infrastructure-platforms/tenant-scale/"

    DA --> GC[Gitaly]
    click GC "/handbook/engineering/infrastructure-platforms/data-access/gitaly/"
    DA --> Git[Git]
    click GG "/handbook/engineering/infrastructure-platforms/data-access/git/"
    DA --> DF[Database Framework]
    click DF "/handbook/engineering/infrastructure-platforms/data-access/database-framework/"
    DA --> DO[Database Operations]
    click DO "/handbook/engineering/infrastructure-platforms/data-access/database-operations/"
    DA --> DU[Durability]
    click DU "/handbook/engineering/infrastructure-platforms/data-access/durability/"

    D --> E[Environment Automation]
    click E "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
    D --> PSS[Public Sector Services]
    click PSS "/handbook/engineering/infrastructure/team/gitlab-dedicated/us-public-sector-services/"
    D --> Switchboard
    click Switchboard "/handbook/engineering/infrastructure/team/gitlab-dedicated/switchboard/"

    DE --> EA[Development Analytics]
    click EA "/handbook/engineering/infrastructure-platforms/developer-experience/engineering-analytics/"
    DE --> DT[Developer Tooling]
    click DT "/handbook/engineering/infrastructure-platforms/developer-experience/developer-tooling/"
    DE --> FE[Feature Readiness]
    click FE "/handbook/engineering/infrastructure-platforms/developer-experience/"
    DE --> PER[Performance Enablement]
    click PER "/handbook/engineering/infrastructure-platforms/developer-experience/performance-enablement/"
    DE --> TG[Test Governance]
    click TG "/handbook/engineering/infrastructure-platforms/developer-experience/test-governance/"

    PE --> CC[Cloud Connector]
    click CC "/handbook/engineering/infrastructure/team/cloud-connector/"
    PE --> Foundations
    click Foundations "/handbook/engineering/infrastructure-platforms/production-engineering/foundations/"
    PE --> Observability
    click Observability "/handbook/engineering/infrastructure/team/scalability/"
    PE --> Ops
    click Ops "/handbook/engineering/infrastructure/team/ops/"
    PE --> Runway
    click Ops "/handbook/engineering/infrastructure/team/runway/"

    SD --> DB[Build]
    click DB "/handbook/engineering/infrastructure-platforms/gitlab-delivery/distribution/"
    SD --> DD[Deploy]
    click DD "/handbook/engineering/infrastructure-platforms/gitlab-delivery/distribution/"
    SD --> FR[Framework]
    click FR "/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/"
    SD --> RE[Releases]
    click RE "/handbook/engineering/infrastructure-platforms/gitlab-delivery/delivery/"
    SD --> SM[Self-managed]
    click SM "/handbook/engineering/infrastructure-platforms/gitlab-delivery/delivery/"

    TS --> Geo
    click Geo "/handbook/engineering/infrastructure-platforms/tenant-scale/geo/"
    TS --> Organizations
    click Organizations "/handbook/engineering/infrastructure-platforms/tenant-scale/organizations/"
    TS --> Cells
    click Cells "/handbook/engineering/infrastructure-platforms/tenant-scale/cells-infrastructure/"

Technical Roadmap

Infrastructure maintains a Technical Roadmap for planning projects over the short (1y), medium (2y), and long term (3y). This serves as our strategic compass, helping us balance immediate needs with long-term sustainability.

The Technical Roadmap is based on the Product Roadmap, where Product provides the “What” (customer needs) and “Why” (business strategy). Engineers then determine the “How” (technical implementation), while Engineering Managers plan the “When” (scheduling). This comprehensive roadmap emphasizes building high-quality, complete features in a sustainable manner.

The Technical Roadmap serves three key purposes:

  1. It helps build engineering excellence by addressing critical areas that might not show up in product backlogs, such as technical debt, performance improvements, platform improvements, and system scalability.

  2. It enables the department to be proactive rather than reactive. By regularly asking key questions like “Where do we see the biggest instability in our systems?” or “What is generating the most toil?”, we can address issues before they become critical problems. This helps maintain our SLOs and keeps our customers happy.

  3. It aligns engineering efforts with business goals, ensuring technical improvements drive GitLab’s success. Each technical roadmap item is prioritized based on business value and strategic alignment.

Current State

The Infrastructure Roadmap is maintained as a static site. GitLab team-members can review the current technical roadmap, at infra-roadmap.gitlab.com.

NOTE: The Infrastructure Roadmap is not publicly available as some of the projects and initiatives may be considered unSAFE.

The site presents the roadmap in a visual manner, showing:

  • Dependencies between planned initiatives
  • Filtering options by confidence, stage, or tags
  • Individual roadmaps for each stage within the department
  • Impact analysis through dependency visualization

Updating the Roadmap

Changes to the Roadmap are made through merge requests to the infra-roadmap project. The data is stored in YAML format, and changes can be made by editing the YAML. This allows for version control and collaborative discussion through the merge request process.

Full instructions for making changes to the Infrastructure Roadmap are available in the project’s README.md.

Everyone is encouraged to contribute to the roadmap, whether proposing new initiatives or making smaller changes like updating descriptions or adding links to relevant issues.

Design

The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.

Dogfooding

The Infrastructure department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.

We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure department makes.

When we consider building tools to help us operate GitLab.com, we follow the 5x rule to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure’s contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.

Handbook use at the Infrastructure department

At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.

The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure department:

  1. The wider community can benefit from training materials, architectural diagrams, technical documentation, and how-to documentation. A good place for this detailed information is in the related project documentation. A handbook page can contain a high level overview, and link to more in-depth information placed in the project documentation.
  2. Think about the audience consuming the material in the handbook. A detailed run through of a GitLab.com operational runbook in the handbook might provide information that is not applicable to self-managed users, potentially causing confusion. Additionally, the handbook is not a go-to place for operational information, and grouping operational information together in a single place while explaining the general context with links as a reference will increase visibility.
  3. Ensure that the handbook pages are easy to consume. Checklists, onboarding, repeatable tasks should be either automated or created in a form of template that can be linked from the handbook.
  4. The handbook is the process. The handbook describes our principles, and our epics and issues are our principles put into practice.

Projects

Classification of the Infrastructure department projects is described on the infrastructure department projects page.

The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.

In addition to tracking the backlog, Infrastructure Department projects are captured in our Infrastructure Department Epic as well as in our Quarterly Objectives & Key Results

Supporting Product Features

We have a model that we use to help us support product features. This model provides details on how we collaborate to ship new features to Production.

Stable Counterparts

Infrastructure SREs may be aligned with stage groups as stable counterparts.

Stable Counterparts are used as a framework for managing reliable services at GitLab. The framework provides guidelines for collaboration between Stage Groups and Infrastructure Teams.

Interviewing

The Infrastructure department hires for a number of different technical specialisms and positions across its teams. This Infrastructure Interviewing Guide offers more detail on some of our regular openings, interview process and other useful information related to applying to jobs with us. More information on our current openings can be found on the careers page.

Slack Channels

General Issue Trackers

Resources

Other Pages


Capacity Planning for GitLab Infrastructure
Introduction In order to scale GitLab infrastructure at the right time and to prevent incidents, we …
Cost Management
GitLab Cost Management
Database
Database Reliability at GitLab The group of Database Reliability Engineers (DBREs) are on the …
Engineering Productivity team
ℹ️ Note: This page is deprecated. The team has been restructured as Development Analytics and …
Getting Assistance on Infrastructure Platforms
How to get assistance for problems on Production Platforms
Incident Management
If you’re a GitLab team member and are looking to alert Reliability Engineering about an …
Infrastructure Environments
Environments The Terraform configuration for the environments can be found in config-mgmt. Future …
Infrastructure Product Management
Responsibilities The responsibilities of the Infrastructure Product Manager are documented in the …
Production
If you’re a GitLab team member and are looking to alert Reliability Engineering about an …
Rate Limiting
This page exists to consolidate GitLab Rate Limiting documentation into a single source of truth. It is intended to reflect the current state of our rate limits, with the target audience being Operators (SRE and Support team members).
Team
See the Infrastructure Platforms Organizational Structure for teams in Infrastructure.
The Infrastructure Platforms Section
Mission The Infrastructure Platforms section enables GitLab Engineering to build and deliver safe, …