Verify:Runner
The GitLab Runner team page.
Vision
By 2025, our vision for GitLab Runner is that the runner’s setup and day-to-day operations at scale be an almost zero-friction experience.
Mission
Our mission is to enable organizations to efficiently run GitLab CI/CD jobs on any computing platform and do so in an operationally efficient and highly secure way at any scale.
This team maps to Verify DevOps stage.
Product Strategy and Roadmap
The product strategy and roadmap for the runner product categories are covered on the following direction pages.
UX strategy
Our UX vision, more information around how UX and Development collaborate, and other UX-related information will be documented in the UX Strategy page.
Our Jobs to be Done are documented in Verify:Runner JTBD and provide a high-level view of the main objectives. Our User Stories are documented in Runner Group - User Stories which guide our solutions as we create design deliverables, and ultimately map back to JTBDs.
In the OPS section, we continuously define, measure, analyze, and iterate or Performance Indicators (PIs). One of the PI process goals is to ensure that, as a product team, we are focused on strategic and operational improvements to improve leading indicators, precursors of future success.
Team Members
The following people are permanent members of the Verify:Runner group:
Name |
Role |
Nicole Williams
|
Senior Engineering Manager, Verify:Runner |
Adrien Kohlbecker
|
Senior Backend Engineer, Verify:Runner |
Arran Walker
|
Senior Backend Engineer, Verify:Runner |
Axel von Bertoldi
|
Senior Backend Engineer, Verify:Runner |
Cam Swords
|
Staff Backend Engineer, Verify:Runner |
Davis Bickford
|
Backend Engineer, Verify:Runner |
Georgi Georgiev
|
Senior Backend Engineer, Verify:Runner |
Hannes Hörl
|
Backend Engineer, Verify:Runner |
Joe Shaw
|
Senior Backend Engineer, Verify:Runner |
Joe Burnett
|
Principal Engineer, Verify:Runner |
Miguel Rincon
|
Staff Frontend Engineer, Verify:Runner |
Pedro Pombeiro
|
Senior Backend Engineer, Verify:Runner |
Romuald Atchadé
|
Backend Engineer, Verify:Runner |
Tomasz Maczukin
|
Senior Backend Engineer, Verify:Runner |
Stable Counterparts
For a more comprehensive list of counterparts, look at the runner product categtory
Dashboards
Projects we maintain
As a team we maintain several projects. The https://gitlab.com/gitlab-com/runner-maintainers group
is added to each project with maintainer permission. We also try to align tools and versions used across them.
Product projects
Runner component projects
CI Steps projects
Helper projects
- Linters
- Testing
- Release
- Maintenance
Runner SaaS projects
- Images
- Configuration and Deployment
- Operations
GitLab projects that rely on Runner public-facing APIs
The following projects depend on the public Runner APIs, and should be taken
into consideration in the scope of any changes/deprecations to the public API surface:
Technologies
We spend a lot of time working in Go which is the language that GitLab Runner is written in. We also contribute to the main GitLab app, working in Rails and Vue.js. Familiarity with Docker and Kubernetes is also useful on our team.
Common Links
How we work
Iterations
We work in monthly iterations. Iteration planning dates for the upcoming milestone are aligned with GitLab’s product development timeline.
At a minimum, 30 days before the start of a milestone, the runner PM reviews and re-prioritizes as needed the features to be included in the iteration planning issue. The planning issue is a tool for asynchoronous collaboration between the PM, EM and members of the team. We use cross-functional prioritization to guide the collaboration process.
The commitments for the iteration plan are directly related to the capacity of the team for the upcoming iteration. Therefore, to finalize the iteration plan (resource allocation) for a milestone, we evaluate and consider the following:
- Forced prioritization issues (these issues will always be first in line for resource allocation.)
- In flight development work that did not complete prior to the feature freeze.
- Strategic direction features.
- Community or customer requested features.
- Bugs
- Technical Debt
- Maintenance
- Community merge requests review assignments
Iteration Planning and Issue Refinement Process
- The PM creates iteration planning issues for at minimum the next three milestones.
- The PM adds candidate issues to the planning issues, applying the appropriate priority label for the iteration (e.g.
Runner::P1
)
- The PM adds the scoped label
~candidate::x.y
to each issue. For example ~candidate::16.0
- The PM assigns the iteration planning issues to the runner EM, UX, QE and TW counterparts.
- The EM reviews all candidate tech debt, bugs, security and feature issues and applies the
deliverable
label to issues based on team capacity. The deliverable label signals a commitment for delivery and is tied directly to our team KPIs. Any issue not receiving the deliverable
label will be treated as stretch and pulled in as team members have capacity.
- At minimum, three business days prior to GitLab’s monthly release kickoff livestream, the PM, EM, Quality and UX leads finalize the iteration plan for the upcoming milestone.
As we have a lot of involvement with our stable counterparts and reliability team, we also add a section to our iteration plan to reflect any blocking
or relating
issues.
- The engineering team adds all
blocking
or related
reliability issues to the iteration plan.
- The reliability team reviews these issues and checks feasibility and suggests changes.
- The reliability team commits to their issues in the iteration plan as long as:
- They don’t affect the current due dates for an ongoing KR.
- They fit under one of the quarterly OKRs of the Reliability::Practices team.
- They take into account downtime related to the OnCall and OnCall follow up work.
Prioritization labeling
To indicate priority of issues during an iteration we may use labels ~"Runner::P1" ~"Runner::P2" ~"Runner::P3"
.
At a minimum we will always identify our top priorities using ~"Runner::P1"
.
~"Runner::P1"
means “elevated priority”. We aim to deliver all or most of these issues.
~"Runner::P2"
means “normal priority”.
~"Runner::P3"
means “reduced priority”.
~"Runner::P*"
labels can and should differ from ~priority:*
labels.
~priority:*
labels imply the timeline for when issues will be addressed.
While ~"Runner::P*"
indicate priority for the scheduled iteration.
Design and development process
We follow the product development flow. Our team uses one issue as SSOT for design, backend, and frontend work.
Once a problem is validated, the issue enters the design phase where the product designer collaborates with the team to ideate solutions and explore different approaches before converging on a single solution that is feasible and has requirements meet the business goals.
Sometimes we need to increase our confidence that the proposed solution meets the user’s needs and expectations. This confidence can be obtained from additional research during the solution validation phase.
Following the design and validation phases, the problem should already be broken down into the quickest change possible to improve the user’s outcome and be ready for a more detailed review by engineering before moving to the build track.
Once the PM intends to prioritize the issue for the next milestone, the ~"workflow::planning breakdown"
label is applied and the EM will assign a developer to further break down and apply weights to that work so that the issue can be ~"workflow::ready for development"
.
Release
At the end of the iteration we release Runner and associated projects. The release process is documented here.
Guidelines for Merge Requests
As a developer on the runner team, you will be contributing to the various runner projects. Since the GitLab Runner project reviewers and maintainers review all code contributions (runner team members and community contributions), we must try and be as efficient as possible when submitting merge requests for review.
The responsibility of the merge request author
We follow the merge request author responsibility guidelines.
The responsibility of Reviewers and Maintainers
We follow the code review guidelines.
To help authors find a reviewer with capacity to take on a review, we have a spreadsheet dashboard that shows the number of MRs any of the backend members of the Verify:Runner or Verify:Runner SaaS groups have assigned.
If you as a reviewer or maintainer who has reached your limit of assigned review MRs, consider asking for assistance from your peers by reassigning some to them. Additionally consider pair-reviewing with the authors on a video call to speed up the review cycle - especially if you have multiple MRs to review from a single author.
Non-team member MRs count towards WIP limit. At GitLab anyone can contribute, and codebases do not equal “teams” or “groups” (even if they happen to share a name). Therefore we should, from time to time, anticipate the occasional MR from a non-team member. Since other teams may not be familiar with our imposed WIP limits, we will need to accommodate them as best we can and the reviewers may need to help with the re-balancing their workload. We should not accept these MRs as a valid reason to go above the WIP limits.
These limits are intended to help with the work load on the reviewers and maintainers. If you are feeling pressured to rush through reviews, talk to your EM. Quality is always more important than speed of review.
Runner Group Specific Onboarding Needs
editor
access to the group-verify
project in GCP
- Add as
maintainer
to the gitlab-com/runner-group
group on GitLab.com
- Make sure entry in
team.yml
has the new member as a reviewer of gitlab-org/gitlab-runner
and gitlab-org/ci-cd/custom-executor-drivers/autoscaler
- Add to
Verify
1password vault (requires creating an access request).
Onboarding
When a new developer joins Runner, their responsibility will include maintaining the runner project and all satelite repositories we own from their first day. This means that the developer will get Maintainer access to our repositories and will be added to the runner-maintainers
group so they appear in merge request approval group.
This allows the onboarding developer to grow organically over time in their responsibilities, which might include (non-exhaustive) code reviews, incident response, operations and releases. We should still follow the traditional two-stage review process for merges in most cases (incident response and operations being exceptions if the situation warrants it).
Becoming a maintainer for one of our projects
Although maintainer access is provided from day one for practical purposes,
we follow the same process outlined here.
Any engineeer inside of the organization is welcome to become a
maintainer of a project owned by the Runner team.
Technical Debt / Backstage work
In general, technical debt, backstage work, or other classifications of development work that don’t directly contribute to a users experience with the runner are handled the same way as features or bugs and covered by the above Kanban style process. The one exception is that for each engineer on the team, they can only have 1 technical debt issue in flight at a time. This means that if they start working on a technical debt type issue they cannot start another one until the first one is merged. In the event that an engineer has more than one technical debt item in flight, they should choose which one to keep working on and move the others to the “in development” or “ready for review” columns depending on their status. The intent of this limitation is to constrain the number of technical debt issues that are in review at any given time to help ensure we always have most of our capacity available to review and iterate on features or bugs.
Retrospectives
The team has a monthly retrospective meeting on the first Tuesday of the
month. The agenda can be found
here
(internal link).
Deprecations process
At GitLab, our release post policy specifies that deprecation notices need to be added to the release post at least two cycles before the release when the feature is removed or officially obsolete. There are typically several deprecations or removals that the runner team needs to manage across the main runner project and the other projects that this team maintains. As such, the runner development team uses the following process to manage deprecations and removals. This process should start no later than one month after the launch of a major release.
- The assigned developer creates a Deprecations and Removal epic for the next major release. See example epic.
- The assigned developer collects all planned deprecations and removals with input from the development team and includes them in the epic.
- The assigned developer verifies that there are deprecation issues created for each deprecation.
- The assigned developer tags the runner development team, engineering manager, and product manager.
- The product manager uses the list of issues to create the deprecation notices. Our goal is to start announcing deprecations no later than six cycles before the next major release.
- The product manager will continue to include the deprecation notices in all release post entries up to and including the major release where the features will be fully deprecated or removed.
Managing CVE vulnerability report issues
Managing CVE vulnerability issues is part of GitLab’s vulnerability management effort
(1,
2), and is an important part of maintaining the
GitLab FedRAMP certification.
Using the container-scanners
project, GitLab
scans all images we produce to highlight CVE vulnerabilities. From those scans, the
vulnmapper
project creates issues in the project that created the vulnerable image, including
SLAs to which we must adhere.
The Runner team member assigned the Support & Security Responder
role in the weekly team task should triage and
review the list of CVEs and address any issues as appropriate:
Critical
severity issues should be addressed immediately.
High
, Medium
, and Low
severity issues should be addressed in the priority order of the
remediation SLAs.
The procedure for addressing CVE issues is as follows:
Surfacing active vulnerability reports
- Use one of the following to surface active CVE issues assigned to our team:
- Focusing on CVE reports in priority order, start with
critical
, high
, and medium
severities first and proceed as
follows:
- For each group of common/related issues, confirm that the associated CVE is still valid. This can be done by
scanning the
latest
version of the image(s) identified in the issue(s) with tools such as
trivy
and grype
, and checking whether the CVE
referenced in the issue appears in the trivy
or grype
scan.
- If the vulnerability is no longer reported in the
trivy
or grype
scan of the relevant image(s), the issue(s)
can be closed. Note that the cver
internal tool mentioned above largely automates this task, including closing
the relevant issues (see the documentation).
- If the vulnerability is still present in the relevant image(s), it must be addressed.
Note that issues that reference ubi-fips
flavors of gitlab-runner
or gitlab-runner-helper
images take precedence
over other image flavors (like alpine
or ubuntu
) since the GitLab FedRAMP certification is contingent on ubi-fips
images only.
Addressing active vulnerability reports
Vulnerabilities usually appear in one of three flavors (ordered in most to least frequency of occurrence):
- The vulnerability exists in a third-party OS package (like
git
or git-lfs
).
- The vulnerability exists in
gitlab-runner
in one of its dependencies.
- The vulnerability exists in
gitlab-runner
in code we’ve written.
Third-party OS packages
In this case, the vulnerability:
- Has not been fixed upstream
- Has been fixed upstream but an OS package including the fix has not been created and published yet
- Will not be fixed upstream
The primary course of action here is to create a
deviation request issue
(see
https://handbook.gitlab.com/handbook/security/security-assurance/security-compliance/poam-deviation-request-procedure/).
We generally create one deviation request issue per offending software module (e.g. git-lfs
or libcurl
). When
creating the issue, be sure to select operational_requirement_template
as a template and complete the following
sections:
- Affected images
- Vulnerability details (one row for each relevant CVE report)
- Relevant
vulnmapper
issues
- Justification
Once the deviation request issue is created, add:
-
A note to all the relevant gitlab-runner
issues pointing to the deviation request issue
-
The label FedRAMP::DR Status::Open
-
The most relevant label from this list:
Vulnerability::Vendor Base Container::Fix Unavailable
Vulnerability::Vendor Base Container::Will Not Be Fixed
Vulnerability::Vendor Package::Fix Unavailable
Vulnerability::Vendor Package::Will Not Be Fixed
Eventually, a fix in the offending package will make its way to the OS package manager, and then both the
gitlab-runner
and deviation request issues can be closed.
gitlab-runner
dependencies
The simplest course of action here is to update the dependency to the latest compatible version (or at least a version
that addresses the vulnerability). Once the MR with the dependency update is merged, the gitlab-runner
issue can be
closed.
If the dependency does not address the vulnerability, possible courses of action are:
- If a fork of the dependency that addresses the vulnerability exists, use it with the Go module
replace
directive. In
this case, be sure to create a task to switch back to the upstream dependency when the vulnerability has been addressed
there.
- If possible, consider not using the dependency or replacing it with another similar dependency.
- Create a deviation request issue.
gitlab-runner
source
The only course of action here is to fix the vulnerable code. If the fix is not simple and will take time to implement
(and prevent us from meeting CVE SLAs), it might be necessary to create a deviation request issue.
Working with security forks
When issues are marked confidential, the MR that fixes the issue should be made in a project’s security fork (see
security-forks). In general the process is identical to
crating and merging MRs in the canonical project repo, with a couple of notable differences.
Note that MRs in the security repo must be reviewed/approved by a security counterpart in addition to a runner
code-owner.
The examples below are given for the GitLab Runner project, but apply
equally to all runner-related projects with security forks.
Keeping the security fork up to date with its canonical repo
Security forks are configured to automatically synchronize with the canonical repo, but this can be disabled if changes
exist in the security fork’s main
branch that do not exists in the canonical repo’s main
branch. This usually
happens when a security MR is merged into the security fork’s main
, but not into the canonical repo’s main
branch.
In this event, it is necessary to manually synchronize the security fork against the canonical repo.
From a checked-out canonical repo:
git fetch # ensure you have the latest changes from the canonical repo.
git remote add security git@gitlab.com:gitlab-org/security/gitlab-runner.git # add the security repo as a remote, be sure to use the git url.
git fetch security # fetch the security fork repo references.
git checkout -b security-main security/main # checkout the security fork's main branch.
git rebase --rebase-merges origin/main # rebase the canoncial main onto the security main.
git log --color --topo-order --oneline # ensure the resulting history is sane.
git push --force # push the resulting local security main brnach to the security remote repo.
Notes:
- These steps will not fully synchronize the security and canonical repositories in both directions. They will only
bring changes that are only the canonical repo, into the security repo. Synchronizing in the other direction is
described below.
- The security repos do/should not have force-push branch protection on the
main
branch, but if the one you are
working with does, temporarily disable it so you can perform the last step.
- If the security fork
main
branch becomes too out of date with the canonical repo main
branch (specifically with
changes that exist only in the security repo), merge conflicts are likely to occur when rebasing the canonical repo
atop the security fork. You will have to resolve these.
Merging security MRs back into the canonical repo
When MRs created in the security repo are merged (into the security repo’s main
branch), the security and canonical
repo will become unsynchronized. Merging MRs from the security fork back into the canonical repo is a manual process.
Each MR in the security repo that a developer wants to incorporate into the canonical repo must be be done manually via
a new MR in the canonical repo. This procedure is manual so developers can control when these merges are done.
To merge an MR already merged in the security fork main
branch into the canonical repo, follow these steps:
From a checked-out canonical repo:
git fetch # ensure you have the latest changes from the canonical repo.
git remote add security git@gitlab.com:gitlab-org/security/gitlab-runner.git # add the security repo as a remote, be sure to use the git url.
git fetch security # fetch the security fork repo references.
git checkout -b name-of-working-branch origin/main # create a new branch into which you'll cherry-pick commits from the security repo.
git cherry-pick sha-of-commit-in-security-repo # cherry-pick all commits from the relevant MR from the security repo into your branch in the canonical repo.
Repeat the final step for all commits in the relevant MR, in topographical order, excluding the merge commit. Do not
include the MR’s merge commit in the cherry-picked commits.
Finally, create an MR in the canonical repo from this branch as usual.
Notes:
- If the security fork becomes too out of date with the canonical repo, merge conflicts are likely when
cherry-picking the commits. You will have to resolve them.
- You should manually synchronize the security repo as described above immediate after the MR is merged into the
canonical main.
- It is not the aim of these instruction to completely synchronize the security and canonical repos in both directions.
Full synchronization will occur as a byproduct of merging all MRs from the security repo into the canonical repo. It
is up to the developers’ discretion when this happens for each MR.
Issue Health Status Definitions
- On Track - We are confident this issue will be completed and live for the current milestone. It is all downhill from here.
- Needs Attention - There are concerns, new complexity, or unanswered questions that if left unattended will result in the issue missing its targeted release. Collaboration needed to get back
On Track
within the week.
- If you are moving an item into this status please mention individuals in the issue you believe can help out in order to unstick the item so that it can get back to an
On Track
status.
- At Risk - The issue in its current state will not make the planned release and immediate action is needed to get it back to
On Track
today.
- If you are moving an item into this status please consider posting in a relevant team channel in slack. Try to include anything that can be done to unstick the item so that it can get back to an
On Track
status in your message.
- Note: It is possible that there is nothing to be done that can get the item back on track in the current milestone. If that is the case please let your manager know as soon as you are aware of this.
Async Issue progress updates
When an engineer is actively working (workflow of ~workflow::“In dev” or further right on current milestone) on an issue they will periodically leave status updates as top-level comments in the issue. The status comment should include the updated health status, any blockers, notes on what was done, if review has started, and anything else the engineer feels is beneficial. If there are multiple people working on it also include whether this is a front end or back end update. An update for each of MR associated with the issue should be included in the update comment. Engineers should also update the health status of the issue at this time.
This update need not adhere to a particular format. Some ideas for formats:
Health status: (On track|Needs attention|At risk)
Notes: (Share what needs to be shared specially when the issue needs attention or is at risk)
Health status: (On track|Needs attention|At risk)
What's left to be done:
What's blocking: (probably empty when on track)
## Update <date>
Health status: (On track|Needs attention|At risk)
What's left to be done:
#### MRs
1. !MyMR1
1. !MyMR2
1. !MyMR3
There are several benefits to this approach:
- Team members can better identify what they can do to help the issue move along the board
- Creates an opening for other engineers to engage and collaborate if they have ideas
- Leaving a status update is a good prompt to ask questions and start a discussion
- The wider GitLab community can more easily follow along with product development
- A history of the roadblocks the issue encountered is readily available in case of retrospection
- Product and Engineering managers are more easily able to keep informed of the progress of work
Some notes/suggestions:
- We typically expect engineers to leave at least one status update per week, barring special circumstances
- Ideally status updates are made at a logical part of an engineers workflow, to minimize disruption
- It is not necessary that the updates happen at the same time/day each week
- Generally when there is a logical time to leave an update, that is the best time
- Engineers are encouraged to use these updates as a place to collect some technical notes and thoughts or “think out loud” as they work through an issue
How to work with us
On issues
Issues worked on by the Runner group a group label of ~group::runner
. Issues that contribute to the verify stage of the DevOps toolchain have the ~devops::verify
label.
Get our attention
GitLab.com: @gitlab-com/runner-group
Slack: #g_runner
Code review
Our code review process follows the general process
where you choose a reviewer (usually not a maintainer) and then send it over to a maintainer for the final review.
Current maintainers are members of the runner-maintainers
group.
Current reviewers are members of the runner-group
group.
Runner PM and engineering pre and post-sales process for runner scaling and configuration deep dives
As part of the pre-sales and post-sales engagement, your customer may have in-depth questions regarding topics such as GitLab Runner configuration, autoscaling options, how concurrency works, distributing the CI jobs workload, monitoring runners, and so on. The goal of the process below is to enable the runner team to be as efficient as possible in providing the level of support that our sales team and customers require.
Step 1
Step 2
- Open an issue in the customer collaboration project and capture the specific configuration questions that the customer has. The purpose of the issue is to address some questions async if possible and finalize the agenda for any follow up synch calls . It also allows us to identify if we need to invite a specific engineer to the customer call. Example issue.
Step 3
- As needed, schedule the sync call with the customer and the Runner PM. The Runner PM will determine if other runner engineers will be included on the call.
Team Resources
See dedicated page.
Overview
The goal of this page is to create, share and iterate on the Jobs to be Done (JTBD) and their corresponding job statements for the Runner group. Our goal is to utilize the JTBD framework to better understand our buyers’ and users’ needs.
Goals
Utilize JTBD and job statements to:
- Understand our users’ motivations.
- Validate identified use cases and solutions.
- Continuously test and iterate features to ensure we are meeting our customers’ and users’ needs.
- Create a transparent view for our stakeholders into the current and future state of the product.
Challenges
- DevOps teams may be managing hundreds of Runners, which may also mean offering different virtual machine configurations and sizes, which adds operational complexity. JTBDs will help us hone in the complex tasks these users need to successfully use our tools.
- Runner’s strategy must consider and balance the needs of both self-managed and GitLab.com product offerings.
- Configuring and managining runners are crucial steps in the continuous integration path, but today first-class enterprise management of runners experience is not easily accessible to users. Bringing this capability to a higher level can increase our reach and growth.
JTBD
Getting started with GitLab Runner
When using a CI/CD tool for the first time, I need to understand what software I need to install and configure to execute the pipeline jobs.
The goal of this page is to document a general risk map for the Runner group.
The goal of this page is to document resources needed for day-to-day work within the Runner group.
The GitLab Runner Group's user stories.
This project's scope is to replace the current autoscaling technology, Docker Machine, used for the GitLab SaaS Shared Runners.