The Infrastructure Platforms Section

Mission

The Infrastructure Platforms section enables GitLab Engineering to build and deliver safe, scalable and efficient features for multi-tenant and single-tenant GitLab SaaS platforms (GitLab.com and GitLab Dedicated).

Vision

To deliver on the mission, we are in the process of formalising the building blocks we need to work on.

Direction

In FY25, teams in the Platforms Section of the Infrastructure Department have collaborated on the “North Star”, which is then used to set the SaaS Platforms Strategy.

Initiatives driven within the Platforms section, often spanning multiple quarters, are represented on the SaaS Platforms section epic.

How we work

Communication

Slack

Our main method of communication is Slack.

If you need assistance with a production issue or incident, please see the section on getting assistance.

SaaS Platforms

Channel Purpose
#s_platforms We collaborate on section level items here. This channel is used to share important information with the wider team, but also serves to align all teams in Platfroms with the common topic.
#g_saas_platforms_leads Communication for managers. Everyone interested is welcome to join this channel if they find the topics interesting.
confidential managers channel Used to discuss staffing issues affecting all teams that require additional coordination. We default to using the public channel as far as possible.
#s_platforms_social Our social channel.

Dedicated

Channel Purpose
#g_dedicated-team Dedicated Group discussion channel. Please use this channel for discussions relevant to engineers across the Dedicated group
#f_gitlab_dedicated Dedicated function channel. Please use this channel to ask questions about features or ways of using the Dedicated product. Dedicated group will use this channel to make announcements relevant to wider groups
#g_dedicated-us-pubsec Dedicated USPubSec team channel. Used to discuss topics that affect PubSec team only. For broader engineering discussions please use #g_dedicated-team
#g_dedicated-switchboard-team Dedicated Switchboard team channel. Used to discuss topics that affect Switchboard team only. For broader engineering discussions please use #g_dedicated-team
#g_dedicated-environment-automation-team Dedicated Environment Automation team channel. Used to discuss topics that affect Switchboard team only. For broader engineering discussions please use #g_dedicated-team
#g_dedicated-team-social Dedicated social channel
#dedicated-mr-review-stream Visibility of new merge requests on Dedicated repos

Delivery

Channel Purpose
#g_delivery Delivery Group channel
#g_delivery_standups
#delivery_social Social channel for the group.
#releases General communication about the current Release/Patch
#f_upcoming_release Detailed Release status / Release Manager channel
#announcements Release-Tools automation posts related to deployment activity

Production Engineering

Channel Purpose
#s_production_engineering General conversation for Production Engineering teams and requests coming in from other team members.
#g_production_engineering_leads Channel for Production Engineering leads (staff+ and management)
#g_infra_ops Team channel for Production Engineering Ops
#g_infra_social Social channel for Production Engineering
#g_foundations Team channel for Production Engineering Foundations
#g_foundations_social Social channel for the Foundations team
#g_foundations_alerts Non-urgent service alerts for Foundations owned services
#g_foundations_notifications Renovate notifications for Foundations owned projects
#infra-terraform-alerts Terraform state drift alerts for SaaS infrastructure

Scalability

Channel Purpose
#g_scalability General conversation for Scalability and requests coming in from other team members.
confidential managers channel Used for specific communication. We default to public channels.
#g_scalability-observability Team channel for general work in Observability.
#g_scalability-practices Team channel for general work in Practices.
#scalability-social Our social channel.
#scalability-id-project-name() We use project specific channels to keep make it easier to follow specific topics. Channel names follow this format.

The SaaS Platforms group is gradually directing requests for help to the #saas-platforms-help Slack channel. This channel can be used if it is unclear which Infrastructure team the question should be directed to. For more information, refer to the landing page for getting assistance.

The #saas-platforms-help channel is monitored by SaaS Platforms Engineering Managers and Staff+ engineers who triage any inbound requests. When triaging this channel, one should locate the team who can best answer this question and instruct the requestor to contact that team using the team’s preferred contact method. When the requestor is connected to the right team, add a green check emoji to the message. Finally, if needed, update the getting assistance] page with any changes.

Meetings

Once per week, we hold a Platforms leads call to align on action items related to career development, general direction or answer any ongoing questions that have not been addressed async. The call is cancelled when there are no topics added on the morning of the call.

In addition to the Platforms leads call, we have some recurring events and reminders that can be viewed in the SaaS Platforms Leadership Calendar. Please add this to your Calendars to stay up-to-date with the various events.

Sr. Director of Infrastructure Marin Jankovski, likes to meet with new team members that join the organization. Marin sets up informal 1:1 coffee chats a few times a month with newer team members to get to know one another and see how they are doing. This process is organized by his EBA who will reach out to team members once he has the availability to meet. As this is a large team, it may take a while to get through everyone. If someone needs to meet with Marin sooner than when the coffee chat is scheduled, you can reach out to his EBA Liki Simonot to set something up.

Grand Review

The Engineering Leads for each Stage, along with their Product Managers, hold weekly progress reviews to assess their groups’ progress, share project updates, resolve blockers, and celebrate wins. Additionally, the Director of Product and the Senior Director of Infrastructure Platforms conduct a higher-level leadership review, where they go over summaries from these group-level meetings.

Weekly Schedule

  • Wednesday: each Epic DRI updates the status section of their epics with the progress. It is important to surface:
    • risks and blockers impacting the project
    • projects that are completed, including a closing summary highlight. These epics will be closed during the Grand Reviews
  • Thursday: Group Level Reviews conducted and added as threads in Infrastructure Platforms Top Level epic (see example)
    • Data Access: run by the Data Access Acting Sr. EM and the Group PM
    • Tenant Scale: run by the Tenant Scale Sr. EM and Group PM
    • Production Engineering: run by the Production Engineering Sr. EM and Group PM
    • Software Delivery: run by the Software Delivery Acting Sr. EM and Group PM
    • Developer Experience: run by the Developer Experience Director and a rotation of Product Managers
    • Dedicated: run by the Dedicated Sr. EM and a rotation of of Product Managers
  • Friday: Leadership Review, run by the Sr. Director of Infrastructure Platforms and the Director of Product Infrastructure Platforms. Review the group-level summaries added as threads in the Infrastructure Platforms Top Level epic, then conduct a deep dive into one specific group to ensure comprehensive project coverage.
  • Friday: Group level and the leadership level reviews are released together in #infrastructure_platforms

The review is private streamed to the GitLab Unfiltered channel because the review covers confidential issues. All recordings are made available in the Platforms Grand Review YouTube Playlist

Infrastructure Platforms Leads Demo

The Infrastructure Platforms Leads Demo is an opportunity for sync discussions between Staff+ IC across the Infrastructure Platforms Group to highlight current ongoing efforts underway in the teams they support. All team members are welcome to join the call, but the emphasis is on Staff+ ICs to present and discuss the work they’re focused on, the problems they’re experiencing, and solutions they’re considering.

The call is recorded to the Infrastructure Platforms Leads Demo Unfiltered Playlist. The agenda can be found in Google Docs.

While the intention is for the call to be made public on GitLab Unfiltered, the default is for it to be published as private. At the end of the call, a quick vote is held between the attendees and if all agree that the content is #SAFE, it can be published as public.

Requests for Help

On the landing page for getting assistance, we ask team-members who need assistance to raise Requests for Help using standard templates.

These issues are raised in the request for help issue tracker and are automatically assigned to the Engineering Manager of the relevant SaaS Platforms team.

The Engineering Manager is expected to:

  1. Confirm that the question is not a duplicate and that the answer to the question is not already discoverable in the handbook or the tracker itself.
  2. Confirm the urgency of the request.
  3. Respond to the help request or assign to an engineer to help with the request.

Slack to GitLab Issue Tracker Integration

In an effort to enhance the tracking and resolution of requests directed to the Infrastructure team, we are evaluating a bot that converts Slack messages in #infrastructure_lounge channel into GitLab issues.

Workflow Overview

  • Acknowledgement: An agent responds with the acknowledged_emoji (👀 in our case) to acknowledge a Slack message in the Infrastructure Lounge channel.
  • Issue Creation: The Slack bot then creates an issue with the acknowledging agent assigned to it.
  • Thread Attachment: The Slack thread corresponding to the message is also posted on the created GitLab issue.
  • Label Assignment: Agents can further categorize issues by adding label emojis (ops, foundations, scalability-observability or scalability-practices) in the Slack message. This action automatically assigns the issue to the respective team: Ops, Foundations, Scalability-Observability or Scalability-Practices.
  • Project Tracking: These converted issues are tracked under a dedicated project hosted at Infrastructure Lounge Slack Issue Tracker.
  • Issue Closure: Agents/Requester can close the issue when resolved by adding any of the resolved_emojis (green-circle-check,white_check_markor checkedin our case)

Configuration

Agents responsible for handling these issues are defined in a JSON file, which serves as a CI/CD variable. Currently, this file contains a static list of all members of the infrastructure department.

Project and Backlog Management

We use epics and issues to manage our work. Our project management process is shared between all teams in SaaS Plaforms.

Tools

The Platforms section builds and maintains various tools to help deploy, operate and monitor our SaaS platforms. You can view a list of these tools in the Platforms Tools Index.

OKR

We use objective and key results to set goals in alignment with OKRs at GitLab. Our OKR process is shared between all teams in Saas Platforms.

Hiring

Our hiring process is shared between all teams in SaaS Plaforms.

Platforms Learning Path

All team members are encouraged to schedule time for personal development. The following links may help you get started with Platforms-relevant learning. Please add your own contributions to this list to help others with their personal development.

Learn about Platforms, and the Platforms Groups

Group Topic
SaaS Platforms Product direction
Delivery Group Mission, Strategy, Team history
Scalability Group Mission, Strategy, Team history
Dedicated Group Mission

Learn about tools and technologies used within Platforms

  1. Jsonnet tutorial
  2. GitLab.com running on the Kubernetes platform

Infrastructure Platforms Tools Index

Tools

The Platforms section builds and maintains various tools to help deploy, operate and monitor our SaaS platforms. The below table is an index to help with the discovery and organization of tools that are actively maintained:

Tool Description
Tamland Capacity planning forecasts for GitLab.com
Stage Group Ownership Index Index of stage groups and their owned objects
Stage Group Error Budgets Objective metrics to determine the reliability of a service
Service Maturity Model Overview of each service’s operating capabilities
Runway GitLab’s internal Platform as a Service implementation
SaaS Platforms Processes

Processes

Process Details
Calibration Details and Schedule of SaaS Platforms annual calibration
The Infrastructure SaaS Platforms Hiring Process
The Infrastructure SaaS Platforms group hiring process and resources
The Infrastructure SaaS Platforms OKRs

OKRs in SaaS Platforms

Creating OKRs

OKRs (or other items outside of projects) that require progress tracking should be updated every Wednesday.

When writing OKRs, the guidance is that:

  • Objective is defined as “What do you want to achieve?”
  • Key Results is defined as “How will you know when you’ve achieved the objective?”
  • As part of a KR, you can also have a sub point - which will likely tie to an epic. This would be an “Initiative”, defined as “How are you going to achieve your key result?”

Objectives

The description for an Objective should have the following format:

The Infrastructure SaaS Platforms Project Management

Project Management in SaaS Platforms

We use GitLab epics and issues to communicate the progress and status of our work. The SaaS Platforms epic is indexing the top level epic for each team, and links to active OKR’s for a given quarter. All teams in SaaS Platforms follow these guidelines so that it is easy for team-members to contribute to different projects if needed.

Projects are reviewed weekly in the Grand Review

Every Wednesday, the DRI for a project is expected to update the status block in the epic description to: