Infrastructure Platforms

The Infrastructure Platforms department is responsible for the availability, reliability, performance, and scalability of GitLab.com and other supporting services

Mission

The Infrastructure Platforms department enables GitLab (the company) to deliver a single DevOps application, and GitLab SaaS users to focus on generating value for their own businesses by ensuring that we operate an enterprise-grade SaaS platform.

The Infrastructure Platforms department does this by focusing on availability, reliability, performance, and scalability efforts. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts.

Many other teams also contribute to the success of the SaaS platform because GitLab.com is not a role. However, it is the responsibility of the Infrastructure Platforms department to drive the ongoing evolution of the SaaS platform, enabled by platform observability data.

Getting Assistance

If you’re a GitLab team member and are looking to alert the Infrastructure Platforms teams about an availability issue with GitLab.com, please find quick instructions to report an incident here: Reporting an Incident.

For all other queries, please see the getting assistance page.

Vision

The Infrastructure Platforms department operates a fast, secure, and reliable SaaS platform to which (and with which) everyone can contribute.

Integral part of this vision is to:

  1. Build a highly performant team of engineers, combining operational and software development experience to influence the best in reliable infrastructure.
  2. Work publicly in accordance with our transparency value.
  3. Use our own product to prepare, build, deliver work, and support the company strategy.
  4. Align our strategy with the industry trends, company direction, and end customer needs.

Direction

The direction is accomplished by using Objectives and Key Results (OKRs).

Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatves across both Infrastructure Platforms and Stage Groups needed to address these gaps.

We are also Product Development

Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:

As such, everyone in the department should be familiar with, and be acting upon, the following statements:

  • We should all feel comfortable contributing to the GitLab open source project
  • If we need something, our first instinct should be to get it into the open source project so it can be given back to the community
  • Try to get it in the open source project first, rather than later, even if it’s 2x harder
  • We should be using the whole product to do our jobs
  • We are all familiar with our Dogfooding process and follow it
  • We should not expect new team members to join the company with these instincts, so we should be willing to teach them
  • It is part of managers’ responsibility to teach these values and behaviors

Organization structure

(click the boxes for more details)

flowchart LR
    I[Infrastructure Platforms]
    click I "/handbook/engineering/infrastructure-platforms/"

    I --> TPM[Technical Program Management]
    click TPM "/handbook/engineering/infrastructure/technical-program-management/"

    I --> EP[Engineering Productivity]
    click EP "/handbook/engineering/infrastructure/engineering-productivity/"
    I --> DA[Data Access]
    click DA "/handbook/engineering/infrastructure/data-access/"
    I --> EA[Engineering Analytics]
    click EA "/handbook/engineering/quality/engineering-analytics/"
    I --> TP[Test Platform]
    click TP "/handbook/engineering/infrastructure/test-platform/"
    I --> SP[SaaS Platforms]
    click SP "/handbook/engineering/infrastructure/platforms/"

    DA --> DF[Database Framework]
    click DF "/handbook/engineering/infrastructure-platforms/data-access/database-framework/"
    DA --> DO[Database Operations]
    click DO "/handbook/engineering/infrastructure-platforms/data-access/database-operations/"
    DA --> Durability
    click Durability "/handbook/engineering/infrastructure-platforms/data-access/durability/"
    DA --> Git
    click Git "/handbook/engineering/infrastructure-platforms/data-access/git/"
    DA --> Gitaly
    click Gitaly "/handbook/engineering/infrastructure-platforms/data-access/gitaly/"

    SP --> DE[Delivery]
    click DE "/handbook/engineering/infrastructure/team/delivery/"
    DE --> Deployments
    DE --> Releases
    SP --> Ops
    click Ops "/handbook/engineering/infrastructure/team/ops/"
    SP --> Foundations
    click Foundations "/handbook/engineering/infrastructure/team/foundations/"
    SP --> Scalability
    click Scalability "/handbook/engineering/infrastructure/team/scalability/"
    Scalability --> Observability
    Scalability --> Practices

    SP --> D[Dedicated]
    click D "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
    D --> E[Environment Automation]
    click E "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
    D --> PSS[Public Sector Services]
    click PSS "/handbook/engineering/infrastructure/team/gitlab-dedicated/us-public-sector-services/"
    D --> Switchboard
    click Switchboard "/handbook/engineering/infrastructure/team/gitlab-dedicated/switchboard/"

    TP --> SMP[Self-Managed Platform]
    click SMP "/handbook/engineering/infrastructure/test-platform/self-managed-platform-team/"
    TP --> TE[Test Engineering]
    click TE "/handbook/engineering/infrastructure/test-platform/test-engineering-team/"
    TP --> TTI[Test and Tools Infrastructure]
    click TTI "/handbook/engineering/infrastructure/test-platform/test-and-tools-infrastructure-team/"

Design

The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.

Dogfooding

The Infrastructure Platforms department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.

We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure Platforms department makes.

When we consider building tools to help us operate GitLab.com, we follow the 5x rule to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure’s contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.

Handbook use at the Infrastructure Platforms department

At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.

The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure Platforms department:

  1. The wider community can benefit from training materials, architectural diagrams, technical documentation, and how-to documentation. A good place for this detailed information is in the related project documentation. A handbook page can contain a high level overview, and link to more in-depth information placed in the project documentation.
  2. Think about the audience consuming the material in the handbook. A detailed run through of a GitLab.com operational runbook in the handbook might provide information that is not applicable to self-managed users, potentially causing confusion. Additionally, the handbook is not a go-to place for operational information, and grouping operational information together in a single place while explaining the general context with links as a reference will increase visibility.
  3. Ensure that the handbook pages are easy to consume. Checklists, onboarding, repeatable tasks should be either automated or created in a form of template that can be linked from the handbook.
  4. The handbook is the process. The handbook describes our principles, and our epics and issues are our principles put into practice.

Projects

Classification of the Infrastructure Platforms department projects is described on the infrastructure department projects page.

The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.

In addition to tracking the backlog, Infrastructure Platforms department projects are captured in our Infrastructure Platforms department Epic as well as in our Quarterly Objectives & Key Results

Supporting Product Features

We have a model that we use to help us support product features. This model provides details on how we collaborate to ship new features to Production.

Ownership

The Infrastructure Platforms team maintains responsibility for the underlying infrastructure on which customer-facing services run. Specific ownership details are in the GitLab Service Ownership Policy

Stable Counterparts

Infrastructure Platforms SREs may be aligned with stage groups as stable counterparts.

Stable Counterparts are used as a framework for managing reliable services at GitLab. The framework provides guidelines for collaboration between Stage Groups and Infrastructure Platforms Teams.

Interviewing

The Infrastructure Platforms department hires for a number of different technical specialisms and positions across its teams. This Infrastructure Platforms Interviewing Guide offers more detail on some of our regular openings, interview process and other useful information related to applying to jobs with us. More information on our current openings can be found on the careers page.

Slack Channels

General Issue Trackers

Resources

Other Pages


Data Access Sub Department

Vision

Provide other groups with well-designed interfaces and patterns for efficient data access that is scalable, reliable, performant, and sustainable for the long term.

All Team Members

The following people are permanent members of teams that belong to the Data Access Sub-department:

Database Framework

The [Database Framework]((/handbook/engineering/infrastructure-platforms/data-access/database-framework/) team develops solutions for scalability, application performance, data growth and developer enablement especially where it concerns interactions with the database.

Tenant Scale Group

Vision

The Tenant Scale group is working towards a horizontally scalable, fault-tolerant architecture for gitlab.com. It is accomplishing this by introducing Cells at the infrastructure layer and Organizations at the application layer, along with Geo for end-to-end resiliency.

Team Members

Group Leads

Name Role
Gerardo Lopez-Fernandez Engineering Fellow
Kamil Trzciński Senior Distinguished Engineer
Steve Xuereb Staff Site Reliability Engineer
Thong Kuah Principal Engineer
Rémy Coutable Principal Engineer
Nick Nguyen Senior Engineering Manager

Geo

Name Role
Lucie ZhaoLucie Zhao Manager, Engineering
Aakriti GuptaAakriti Gupta Senior Backend Engineer
Douglas Barbosa AlexandreDouglas Barbosa Alexandre Staff Backend Engineer
Gabriel MazettoGabriel Mazetto Senior Backend Engineer
Ian BaumIan Baum Senior Backend Engineer
Kyle YetterKyle Yetter Senior Backend Engineer
Michael KozonoMichael Kozono Staff Backend Engineer
Natanael SilvaNatanael Silva Backend Engineer
Scott MurrayScott Murray Backend Engineer
Zack CuddyZack Cuddy Staff Frontend Engineer

Organizations

Name Role
Sissi YaoSissi Yao Backend Engineering Manager, Tenant Scale
Abdul WadoodAbdul Wadood Senior Backend Engineer, Tenant Scale
Alex PooleyAlex Pooley Staff Backend Engineer, Tenant Scale
Peter HegmanPeter Hegman Senior Frontend Engineer, Tenant Scale
Rutger WesselsRutger Wessels Senior Backend Engineer, Tenant Scale
Shubham KumarShubham Kumar Backend Engineer, Tenant Scale
Shane MaglangitShane Maglangit Fullstack Engineer, Tenant Scale
Steve XuerebSteve Xuereb Staff Site Reliability Engineer, Tenant Scale

Cells Infrastructure

Name Role
Nick NguyenNick Nguyen Senior Engineering Manager, Data Stores
Aaron RichterAaron Richter Site Reliability Engineer, Cells Infrastructure
Bojan MarjanovićBojan Marjanović Senior Backend Engineer, Cells Infrastructure
David LeachDavid Leach Site Reliability Engineer, Cells Infrastructure
Jen-Shin LinJen-Shin Lin Senior Backend Engineer, Cells Infrastructure
Omar QunsulOmar Qunsul Senior Backend Engineer, Cells Infrastructure
Vladimir GlafirovVladimir Glafirov Senior Site Reliability Engineer, Cells Infrastructure

Resources

Last modified November 13, 2024: Move gitaly pages over to data access (c16c2006)