Infrastructure Platforms
The Infrastructure Platforms department is responsible for the availability, reliability, performance, and scalability of GitLab SaaS Platforms and supporting services
Mission
As Infrastructure Platforms, our mission is to enable GitLab to deliver a single DevSecOps platform across SaaS and self-managed platforms by building highly available, reliable, performant, and scalable infrastructure solutions while maintaining the lowest total cost of ownership.
Vision
Deliver the industry leading SaaS solutions, empowering organizations worldwide with the most innovative and efficient DevSecOps platform.
Getting Assistance
If you’re a GitLab team member and are looking to alert the Infrastructure Platforms teams about an availability issue with GitLab.com, please find quick instructions to report an incident here: Reporting an Incident.
For all other queries, please see the getting assistance page.
Direction
Initiatives driven within the Platforms section, often spanning multiple quarters, are represented on the SaaS Platforms section epic (GitLab team member).
We are also Product Development
Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:
As such, everyone in the department should be familiar with, and be acting upon, the following statements:
- We should all feel comfortable contributing to the GitLab open source project
- If we need something, our first instinct should be to get it into the open source project so it can be given back to the community
- Try to get it in the open source project first, rather than later, even if it’s 2x harder
- We should be using the whole product to do our jobs
- We are all familiar with our Dogfooding process and follow it
- We should not expect new team members to join the company with these instincts, so we should be willing to teach them
- It is part of managers’ responsibility to teach these values and behaviors
Organization structure
(click the boxes for more details)
flowchart LR
I[Infrastructure Platforms]
click I "/handbook/engineering/infrastructure-platforms/"
I --> DA[Data Access]
click DA "/handbook/engineering/infrastructure-platforms/data-access/"
I --> DE[Developer Experience]
click DE "/handbook/engineering/infrastructure-platforms/developer-experience/"
I --> SP[SaaS Platforms]
click SP "/handbook/engineering/infrastructure/platforms/"
DA --> DF[Database Framework]
click DF "/handbook/engineering/infrastructure-platforms/data-access/database-framework/"
DA --> DO[Database Operations]
click DO "/handbook/engineering/infrastructure-platforms/data-access/database-operations/"
DA --> Durability
click Durability "/handbook/engineering/infrastructure-platforms/data-access/durability/"
DA --> Git
click Git "/handbook/engineering/infrastructure-platforms/data-access/git/"
DA --> Gitaly
click Gitaly "/handbook/engineering/infrastructure-platforms/data-access/gitaly/"
SP --> DE[Delivery]
click DE "/handbook/engineering/infrastructure/team/delivery/"
DE --> Deployments
DE --> Releases
SP --> Ops
click Ops "/handbook/engineering/infrastructure/team/ops/"
SP --> Foundations
click Foundations "/handbook/engineering/infrastructure/team/foundations/"
SP --> Scalability
click Scalability "/handbook/engineering/infrastructure/team/scalability/"
Scalability --> Observability
Scalability --> Practices
SP --> D[Dedicated]
click D "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
D --> E[Environment Automation]
click E "/handbook/engineering/infrastructure/team/gitlab-dedicated/"
D --> PSS[Public Sector Services]
click PSS "/handbook/engineering/infrastructure/team/gitlab-dedicated/us-public-sector-services/"
D --> Switchboard
click Switchboard "/handbook/engineering/infrastructure/team/gitlab-dedicated/switchboard/"
DE --> DA[Development Analytics]
click DA "/handbook/engineering/infrastructure-platforms/developer-experience/development-analytics/"
DE --> DT[Developer Tooling]
click DT "/handbook/engineering/infrastructure-platforms/developer-experience/developer-tooling-team/"
DE --> FR[Feature Readiness]
click FR "/handbook/engineering/infrastructure-platforms/developer-experience"
DE --> PE[Performance Enablement]
click PE "/handbook/engineering/infrastructure-platforms/developer-experience/performance-enablement/"
DE --> TG[Test Governance]
click TG "/handbook/engineering/infrastructure-platforms/developer-experience"
Dogfooding
The Infrastructure Platforms department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure Platforms department makes.
When we consider building tools to help us operate GitLab.com, we follow the 5x rule
to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure’s contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure Platforms department:
- The wider community can benefit from training materials, architectural diagrams, technical documentation, and how-to documentation. A good place for this detailed information is in the related project documentation. A handbook page can contain a high level overview, and link to more in-depth information placed in the project documentation.
- Think about the audience consuming the material in the handbook. A detailed run through of a GitLab.com operational runbook in the handbook might provide information that is not applicable to self-managed users, potentially causing confusion. Additionally, the handbook is not a go-to place for operational information, and grouping operational information together in a single place while explaining the general context with links as a reference will increase visibility.
- Ensure that the handbook pages are easy to consume. Checklists, onboarding, repeatable tasks should be either automated or created in a form of template that can be linked from the handbook.
- The handbook is the process. The handbook describes our principles, and our epics and issues are our principles put into practice.
Projects
Classification of the Infrastructure Platforms department projects is described on the infrastructure department projects page.
The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.
In addition to tracking the backlog, Infrastructure Platforms department projects are captured in our Infrastructure Platforms department Epic as well as in our Quarterly Objectives & Key Results
Supporting Product Features
We have a model that we use to help us support product features. This model provides details on how we collaborate to ship new features to Production.
How we work
Communication
Slack
Our main method of communication is Slack.
If you need assistance with a production issue or incident, please see the section on getting assistance.
SaaS Platforms
Channel |
Purpose |
#infrastructure_-_platforms |
We collaborate on department level items here. This channel is used to share important information with the wider team, but also serves to align all teams in Platfroms with the common topic. |
#g_infrastructure_platforms_leads |
Communication for managers. Everyone interested is welcome to join this channel if they find the topics interesting. |
confidential managers channel |
Used to discuss staffing issues affecting all teams that require additional coordination. We default to using the public channel as much as possible. |
#infrastructure_platforms_social |
Our social channel. |
Dedicated
Delivery
Production Engineering
Scalability
The SaaS Platforms group is gradually directing requests for help to the #saas-platforms-help Slack channel.
This channel can be used if it is unclear which Infrastructure team the question should be directed to.
For more information, refer to the landing page for getting assistance.
The #saas-platforms-help channel is monitored by SaaS Platforms Engineering Managers and Staff+ engineers who triage any inbound requests. When triaging this channel, one should locate the team who can best answer this question and instruct the requestor to contact that team using the team’s preferred contact method. When the requestor is connected to the right team, add a green check emoji to the message. Finally, if needed, update the getting assistance page with any changes.
Meetings
Once per week, we hold a Platforms leads call
to align on action items related to career development, general direction or answer any ongoing questions that have not been addressed async. The call is cancelled when there are no topics added on the morning of the call.
In addition to the Platforms leads call
, we have some recurring events and reminders that can be viewed in the SaaS Platforms Leadership Calendar. Please add this to your Calendars to stay up-to-date with the various events.
Sr. Director of Infrastructure Marin Jankovski, likes to meet with new team members that join the organization. Marin sets up informal 1:1 coffee chats a few times a month with newer team members to get to know one another and see how they are doing. This process is organized by his EBA who will reach out to team members once he has the availability to meet. As this is a large team, it may take a while to get through everyone.
If someone needs to meet with Marin sooner than when the coffee chat is scheduled, you can reach out to his EBA Liki Simonot to set something up.
Grand Review
The Engineering Leads for each Stage, along with their Product Managers, hold weekly progress reviews to assess their groups’ progress, share project updates, resolve blockers, and celebrate wins. Additionally, the Director of Product and the Senior Director of Infrastructure Platforms conduct a higher-level leadership review, where they go over summaries from these group-level meetings.
Weekly Schedule
- Wednesday: each Epic DRI updates the status section of their epics with the progress. It is important to surface:
- risks and blockers impacting the project
- projects that are completed, including a closing summary highlight. These epics will be closed during the Grand Reviews
- Thursday: Group Level Reviews conducted and added as threads in Infrastructure Platforms Top Level epic (see example)
- Data Access: run by the Data Access Acting Sr. EM and the Group PM
- Tenant Scale: run by the Tenant Scale Sr. EM and Group PM
- Production Engineering: run by the Production Engineering Sr. EM and Group PM
- Software Delivery: run by the Software Delivery Acting Sr. EM and Group PM
- Developer Experience: run by the Developer Experience Director and a rotation of Product Managers
- Dedicated: run by the Dedicated Sr. EM and a rotation of of Product Managers
- Friday: Leadership Review, run by the Sr. Director of Infrastructure Platforms and the Director of Product Infrastructure Platforms. Review the group-level summaries added as threads in the Infrastructure Platforms Top Level epic, then conduct a deep dive into one specific group to ensure comprehensive project coverage.
- Friday: Group level and the leadership level reviews are released together in #infrastructure_platforms
The review is private streamed to the GitLab Unfiltered channel because the review covers confidential issues. All recordings are made available in the Platforms Grand Review YouTube Playlist
The Infrastructure Platforms Leads Demo is an opportunity for sync discussions between Staff+ IC across the Infrastructure Platforms Group to highlight current ongoing efforts underway in the teams they support.
All team members are welcome to join the call, but the emphasis is on Staff+ ICs to present and discuss the work they’re focused on, the problems they’re experiencing, and solutions they’re considering.
The call is recorded to the Infrastructure Platforms Leads Demo Unfiltered Playlist. The agenda can be found in Google Docs.
While the intention is for the call to be made public on GitLab Unfiltered, the default is for it to be published as private.
At the end of the call, a quick vote is held between the attendees and if all agree that the content is #SAFE, it can be published as public.
Requests for Help
On the landing page for getting assistance, we ask team-members who need assistance to raise Requests for Help using standard templates.
These issues are raised in the request for help issue tracker and are automatically assigned to the Engineering Manager of the relevant SaaS Platforms team.
The Engineering Manager is expected to:
- Confirm that the question is not a duplicate and that the answer to the question is not already discoverable in the handbook or the tracker itself.
- Confirm the urgency of the request.
- Respond to the help request or assign to an engineer to help with the request.
Slack to GitLab Issue Tracker Integration
In an effort to enhance the tracking and resolution of requests directed to the Infrastructure team, we are evaluating a bot that converts Slack messages in #infrastructure_lounge channel into GitLab issues.
Workflow Overview
- Acknowledgement: An agent responds with the
acknowledged_emoji
(👀 in our case) to acknowledge a Slack message in the Infrastructure Lounge channel.
- Issue Creation: The Slack bot then creates an issue with the acknowledging agent assigned to it.
- Thread Attachment: The Slack thread corresponding to the message is also posted on the created GitLab issue.
- Label Assignment: Agents can further categorize issues by adding label emojis (
ops
, foundations
, scalability-observability
or scalability-practices
) in the Slack message. This action automatically assigns the issue to the respective team: Ops, Foundations, Scalability-Observability or Scalability-Practices.
- Project Tracking: These converted issues are tracked under a dedicated project hosted at Infrastructure Lounge Slack Issue Tracker.
- Issue Closure: Agents/Requester can close the issue when resolved by adding any of the
resolved_emojis
(green-circle-check
,white_check_mark
or checked
in our case)
Configuration
Agents responsible for handling these issues are defined in a JSON file, which serves as a CI/CD variable. Currently, this file contains a static list of all members of the infrastructure department.
Project and Backlog Management
We use epics and issues to manage our work. Our project management process is shared between all teams in SaaS Plaforms.
The Platforms section builds and maintains various tools to help deploy, operate and monitor our SaaS platforms. You can view a list of these tools in the Platforms Tools Index.
OKR
We use objective and key results to set goals in alignment with OKRs at GitLab.
Our OKR process is shared between all teams in Saas Platforms.
Hiring and Interviewing
Our hiring process is shared between all teams in Infrastructure Plaforms.
This Infrastructure Platforms Interviewing Guide offers more detail on some of our regular openings, interview process and other useful information related to applying to jobs with us. More information on our current openings can be found on the careers page.
All team members are encouraged to schedule time for personal development. The following links may help you get started with Platforms-relevant learning. Please add your own contributions to this list to help others with their personal development.
- Jsonnet tutorial
Common Links
Other Slack Channels
General Issue Trackers
Resources
Other Pages
Vision
Provide other groups with well-designed interfaces and patterns for efficient
data access that is scalable, reliable, performant, and sustainable for the long
term.
All Team Members
The following people are permanent members of teams that belong to the Data
Access Sub-department:
Database Framework
The Database Framework
team develops solutions for scalability, application performance, data growth and
developer enablement especially where it concerns interactions with the
database.
Developer Experience is a newly formed group, born from the strategic merger of the Engineering Productivity team and the Test Platform sub-department. This exciting combination allows us to take a holistic approach to delivering cutting-edge Platform capabilities.
The GitLab Delivery Stage focuses on enhancing the reliability, efficiency, and speed of GitLab’s end-to-end software delivery across all platforms and offerings.
Vision
The Tenant Scale group is working towards a horizontally scalable, fault-tolerant architecture for gitlab.com. It is accomplishing this by introducing Cells at the infrastructure layer and Organizations at the application layer, along with Geo for end-to-end resiliency.
Team Members
Group Leads
Name |
Role |
Gerardo Lopez-Fernandez |
Engineering Fellow |
Kamil Trzciński |
Senior Distinguished Engineer |
Steve Xuereb |
Staff Site Reliability Engineer |
Thong Kuah |
Principal Engineer |
Rémy Coutable |
Principal Engineer |
Nick Nguyen |
Senior Engineering Manager |
Geo
Organizations
Cells Infrastructure
Resources