The Infrastructure Platforms department is responsible for the availability, reliability, performance, and scalability of GitLab SaaS Platforms and supporting services
Mission
As Infrastructure Platforms, our mission is to enable GitLab to deliver a single DevSecOps platform across SaaS and self-managed platforms by building highly available, reliable, performant, and scalable infrastructure solutions while maintaining the lowest total cost of ownership.
Vision
Deliver the industry leading SaaS solutions, empowering organizations worldwide with the most innovative and efficient DevSecOps platform.
Getting Assistance
If you’re a GitLab team member and are looking to alert the Infrastructure Platforms teams about an availability issue with GitLab.com, please find quick instructions to report an incident here: Reporting an Incident.
Initiatives driven within the Platforms section, often spanning multiple quarters, are represented on the SaaS Platforms section epic (GitLab team member).
We are also Product Development
Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:
We should not expect new team members to join the company with these instincts, so we should be willing to teach them
It is part of managers’ responsibility to teach these values and behaviors
Organization structure
(click the boxes for more details)
Dogfooding
The Infrastructure Platforms department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
When we consider building tools to help us operate GitLab.com, we follow the 5x rule to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure’s contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
Handbook use at the Infrastructure Platforms department
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure Platforms department:
The wider community can benefit from training materials, architectural diagrams, technical documentation, and how-to documentation. A good place for this detailed information is in the related project documentation. A handbook page can contain a high level overview, and link to more in-depth information placed in the project documentation.
Think about the audience consuming the material in the handbook. A detailed run through of a GitLab.com operational runbook in the handbook might provide information that is not applicable to self-managed users, potentially causing confusion. Additionally, the handbook is not a go-to place for operational information, and grouping operational information together in a single place while explaining the general context with links as a reference will increase visibility.
Ensure that the handbook pages are easy to consume. Checklists, onboarding, repeatable tasks should be either automated or created in a form of template that can be linked from the handbook.
The handbook is the process. The handbook describes our principles, and our epics and issues are our principles put into practice.
The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.
We collaborate on department level items here. This channel is used to share important information with the wider team, but also serves to align all teams in Platfroms with the common topic.
Dedicated function channel. Please use this channel to ask questions about features or ways of using the Dedicated product. Dedicated group will use this channel to make announcements relevant to wider groups
Dedicated Switchboard team channel. Used to discuss topics that affect Switchboard team only. For broader engineering discussions please use #g_dedicated-team
Dedicated Environment Automation team channel. Used to discuss topics that affect Switchboard team only. For broader engineering discussions please use #g_dedicated-team
Channel for cross-functional discussion and coordination on Cells and Organizations.
The SaaS Platforms group is gradually directing requests for help to the #saas-platforms-help Slack channel.
This channel can be used if it is unclear which Infrastructure team the question should be directed to.
For more information, refer to the landing page for getting assistance.
The #saas-platforms-help channel is monitored by SaaS Platforms Engineering Managers and Staff+ engineers who triage any inbound requests. When triaging this channel, one should locate the team who can best answer this question and instruct the requestor to contact that team using the team’s preferred contact method. When the requestor is connected to the right team, add a green check emoji to the message. Finally, if needed, update the getting assistance page with any changes.
Meetings
Once per week, we hold a Platforms leads call to align on action items related to career development, general direction or answer any ongoing questions that have not been addressed async. The call is cancelled when there are no topics added on the morning of the call.
In addition to the Platforms leads call, we have some recurring events and reminders that can be viewed in the SaaS Platforms Leadership Calendar. Please add this to your Calendars to stay up-to-date with the various events.
Sr. Director of Infrastructure Marin Jankovski, likes to meet with new team members that join the organization. Marin sets up informal 1:1 coffee chats a few times a month with newer team members to get to know one another and see how they are doing. This process is organized by his EBA who will reach out to team members once he has the availability to meet. As this is a large team, it may take a while to get through everyone.
If someone needs to meet with Marin sooner than when the coffee chat is scheduled, you can reach out to his EBA Liki Simonot to set something up.
Grand Review
The Engineering Leads for each Stage, along with their Product Managers, hold weekly progress reviews to assess their groups’ progress, share project updates, resolve blockers, and celebrate wins. Additionally, the Director of Product and the Senior Director of Infrastructure Platforms conduct a higher-level leadership review, where they go over summaries from these group-level meetings.
Weekly Schedule
Wednesday: each Epic DRI updates the status section of their epics with the progress. It is important to surface:
risks and blockers impacting the project
projects that are completed, including a closing summary highlight. These epics will be closed during the Grand Reviews
Data Access: run by the Data Access Acting Sr. EM and the Group PM
Tenant Scale: run by the Tenant Scale Sr. EM and Group PM
Production Engineering: run by the Production Engineering Sr. EM and Group PM
Software Delivery: run by the Software Delivery Acting Sr. EM and Group PM
Developer Experience: run by the Developer Experience Director and a rotation of Product Managers
Dedicated: run by the Dedicated Sr. EM and a rotation of of Product Managers
Friday: Leadership Review, run by the Sr. Director of Infrastructure Platforms and the Director of Product Infrastructure Platforms. Review the group-level summaries added as threads in the Infrastructure Platforms Top Level epic, then conduct a deep dive into one specific group to ensure comprehensive project coverage.
Friday: Group level and the leadership level reviews are released together in #infrastructure_platforms
The review is private streamed to the GitLab Unfiltered channel because the review covers confidential issues. All recordings are made available in the Platforms Grand Review YouTube Playlist
Infrastructure Platforms Leads Demo
The Infrastructure Platforms Leads Demo is an opportunity for sync discussions between Staff+ IC across the Infrastructure Platforms Group to highlight current ongoing efforts underway in the teams they support.
All team members are welcome to join the call, but the emphasis is on Staff+ ICs to present and discuss the work they’re focused on, the problems they’re experiencing, and solutions they’re considering.
While the intention is for the call to be made public on GitLab Unfiltered, the default is for it to be published as private.
At the end of the call, a quick vote is held between the attendees and if all agree that the content is #SAFE, it can be published as public.
These issues are raised in the request for help issue tracker and are automatically assigned to the Engineering Manager of the relevant SaaS Platforms team.
The Engineering Manager is expected to:
Confirm that the question is not a duplicate and that the answer to the question is not already discoverable in the handbook or the tracker itself.
Confirm the urgency of the request.
Respond to the help request or assign to an engineer to help with the request.
Slack to GitLab Issue Tracker Integration
In an effort to enhance the tracking and resolution of requests directed to the Infrastructure team, we are evaluating a bot that converts Slack messages in #infrastructure_lounge channel into GitLab issues.
Workflow Overview
Acknowledgement: An agent responds with the acknowledged_emoji (👀 in our case) to acknowledge a Slack message in the Infrastructure Lounge channel.
Issue Creation: The Slack bot then creates an issue with the acknowledging agent assigned to it.
Thread Attachment: The Slack thread corresponding to the message is also posted on the created GitLab issue.
Label Assignment: Agents can further categorize issues by adding label emojis (ops, foundations, observability) in the Slack message. This action automatically assigns the issue to the respective team: Ops, Foundations, Observability.
Issue Closure: Agents/Requester can close the issue when resolved by adding any of the resolved_emojis (green-circle-check,white_check_markor checkedin our case)
Configuration
Agents responsible for handling these issues are defined in a JSON file, which serves as a CI/CD variable. Currently, this file contains a static list of all members of the infrastructure department.
Project and Backlog Management
We use epics and issues to manage our work. Our project management process is shared between all teams in SaaS Plaforms.
Tools
The Platforms section builds and maintains various tools to help deploy, operate and monitor our SaaS platforms. You can view a list of these tools in the Platforms Tools Index.
OKR
We use objective and key results to set goals in alignment with OKRs at GitLab.
Our OKR process is shared between all teams in Saas Platforms.
Hiring and Interviewing
Our hiring process is shared between all teams in Infrastructure Plaforms.
This Infrastructure Platforms Interviewing Guide offers more detail on some of our regular openings, interview process and other useful information related to applying to jobs with us. More information on our current openings can be found on the careers page.
Platforms Learning Path
All team members are encouraged to schedule time for personal development. The following links may help you get started with Platforms-relevant learning. Please add your own contributions to this list to help others with their personal development.
Learn about Infrastructure Platforms, and its groups
The Data Access sub-department is responsible for the sustainability and availability of access to GitLab’s user data, in alignment with customer needs and GitLab’s business objectives.
The scope of user data includes Git, PostgreSQL, ClickHouse, Redis, Object Storage and the development of a scalable backup system for all GitLab deployments.
For all GitLab deployments:
We design, operate and evolve GitLab’s data storage architecture and interfaces, or provide assistance to those responsible.
We guide feature owners in reaching business goals safely, throughout the feature life cycle.
We aid customers directly in incidents or escalations, and indirectly by innovating to meet their needs.
It is the job of each Data Access team to hold feature owners accountable for responsible access patterns and to thereby ensure the stability of our shared data storage systems. This is an active process and requires building relationships for collaboration, guiding through paved paths, and providing tools and knowledge Team Members can use and build on.
Developer Experience is a newly formed group, born from the strategic merger of the Engineering Productivity team and the Test Platform sub-department. This exciting combination allows us to take a holistic approach to delivering cutting-edge Platform capabilities.
The GitLab Delivery Stage focuses on enhancing the reliability, efficiency, and speed of GitLab’s end-to-end software delivery across all platforms and offerings.
The Tenant Scale group is working towards a horizontally scalable, fault-tolerant architecture for gitlab.com. It is accomplishing this by introducing Cells at the infrastructure layer and Organizations at the application layer, along with Geo for end-to-end resiliency.