Verify:Runner Project Plans

This project’s scope is to replace the current autoscaling technology, Docker Machine, used for the GitLab SaaS Shared Runners.

Autoscaling Provider for GitLab Runner to replace Docker Machine

Runner Team DRI: Arran Walker

Description: This project’s scope is to replace the current autoscaling technology, Docker Machine, used for the GitLab SaaS Shared Runners. To view the complete implementation plan please visit the parent epic that is currently tracking this work.

Week of 2023-09-18

Goals

Week of 2023-09-25

Goals

Week of 2023-10-02

Goals

Week of 2023-10-09

Goals

Week of 2023-10-16

Goals

Dedicated SaaS Runners for GitLab Dedicated

Runner Team DRI: Joseph Burnett Slack Channel: #f_hosted_runners

Description: A dedicated runner is a runner that would only be registered to a specific project, group, or instance and not be shared with other users. With this project GitLab will spin up dedicated runner resources within the Dedicated cloud account. The project plan tracking this work can be found here.

Iteration 16.5 (ending 2023-10-19)

Week of 2023-09-25

Week of 2023-10-02

Goals

  • Consolidate all work under a single sub-epic. Update issue and epic structure to reflect our agreement to deliver infrastructure-as-a-library (GRIT) and for Environment Automation to operate the runners themselves.

Week of 2023-10-09

Clarified epic and issue structure with Dedicated (thread). New project plan for Runner side to replace previous epic: https://gitlab.com/gitlab-org/ci-cd/shared-runners/infrastructure/-/issues/158 New GRIT sub-epic to track all library work for this use-case: https://gitlab.com/groups/gitlab-org/ci-cd/runner-tools/-/epics/2

Goals

  • Create a provisional Docker capable Linux AMI
  • Update dev environment to support Linux.

Iteration 16.6 (ending 2023-11-17)

Goals

Week of 2023-11-20

Completed end-to-end functionality for both the test template and prod template. End-to-end testing has been added and unit tests are in-progress as we decide a reusable approach for all unit test cases. The demo video for GRIT beta prod was recorded, demonstrating the latest state of GRIT using the prod template.

Iteration Goals

Week of 2023-11-27

If you haven't had a chance, please check out the demo video for GRIT beta prod, which demonstrates the latest state of GRIT using the prod template. Progress was made last week on adding unit tests, and the only blocker appears to be an issue that is breaking e2e tests in both the unit test branch and the master branch. We will investigate and get the test passing this week. We will also update READMEs to reflect the recent changes and refactors as well as begin adding granularity to VPC configuration in the prod module.

Goals

Week of 2023-12-04

We investigated the failing E2E tests and discovered the issue was caused by changes to log levels in GitLab Runner. The E2E tests were fixed and unit tests merged. I began updating the README and discovered our refactors have broken some configurations that involve GCP. These configurations predated our unit and e2e testing. I added an issue and put up an MR to fix the broken configurations and add test coverage. Our current focus is on AWS for Dedicated Runners, so my aim with the fixes for GCP configs in the README is to cover only those README cases at this time and follow up on more thorough GCP tests when we move our focus to GCP at a later date. We also began discussions on how to customize VPCs and subnets in Dedicated Runners.

Goals

Runner Fleet Dashboard

Runner Team DRI: Vladimir Shushlin Slack Channel: #f_runner_fleet_management

Description: Operators of self-managed runner fleets need, at a glance, observability or, more specifically, the ability to quickly answer critical questions about their Runner Fleet infrastructure. Providing actionable insights in the Runner Fleet Dashboard equips GitLab Runner Fleets operators with the tools they need to ensure that developers in their organization can consistently and efficiently run CI/CD jobs at scale. The answers to questions such as how fast will CI/CD jobs start, are our CI/CD jobs waiting in a queue, are there are performance or other problems with the CI/CD job environment will be readily available in the Runner Fleet Dashboard. The result is improved developer efficiency, reduced costs, and excellent customer experience for the development teams that rely on the CI/CD build infrastructure.

Week of 2023-10-02

Goals

Summary

We got everything working on Staging, but discovered two bugs in data ingestion:

  1. duplicating some data
  2. not handling removed ci_builds

So we disabled data ingestion, and are working on fixing it.

We also reviewed the bigger epic, and decided that nothing blocks us from enabling the dashboard for everyone without ClickHouse part of data.

There are ongoing discussions on how to use this ci analytics architecture in other features.

Week of 2023-10-09

Continuing to focus on issues found last week with the goal of re-enabling dashboard on staging and production.

Goals

Summary

  1. We got Runner dashboard enabled on both staging and production including ClickHouse!
  2. Decided not to release dashboard in 16.5. We want to test it a bit better and prepare a proper release post.
  3. Omnibus part stuck in review a bit.
  4. During the data ingestion we discovered that we duplicate some data, it needs investigation

Week of 2023-10-16

With production rollout complete we now will focus on delivering the dashboard(including Clickhouse) to self-managed. We also have half of the usual backend capacity throughout the 16.6 due to planned time off.

Goals

Weeks of 2023-10-23 - 2023-11-20

We released the dashboard for self-managed in 16.6. And were working delivering ClickHouse-powered part of the dashboard to self-managed.

Goals

Week of 2023-11-27

With the basic migrations support in place, we’re now working on running those migrations automatically during the normal GitLab upgrade process for omnibus and gitlab chart.

We’re also working on fixing the duplicates issue.

Goals

Last modified June 27, 2024: Fix various vale errors (46417d02)