Verify:Runner Project Plans
Autoscaling Provider for GitLab Runner to replace Docker Machine
Runner Team DRI: Arran Walker
Description: This project’s scope is to replace the current autoscaling technology, Docker Machine, used for the GitLab SaaS Shared Runners. To view the complete implementation plan please visit the parent epic that is currently tracking this work.
Week of 2023-09-18
Goals
[ ]fleeting
: Log AWS Autoscaling Activity- Implemention revealed questions about whether we really want/need this.
-
fleeting
: Add Shutdown func to API -
fleeting
: Add public provisioning integration tests that plugins can use
Week of 2023-09-25
Goals
-
fleeting-plugin-aws
: AWS plugin integration tests -
taskscaler
: Integration tests
Week of 2023-10-02
Goals
-
fleeting-plugin-googlecompute
: GCP plugin integration tests -
runner
: Taskscaler-based executor integration tests
Week of 2023-10-09
Goals
-
fleeting-plugin-googlecompute
: GCP plugin integration tests -
runner
: Taskscaler-based executor integration tests -
fleeting-plugin-googlecompute
: Unit tests -
runner-incept
: End-to-End test runner manager in GCE -
fleeting-plugin-googlecompute
: Recommend minimum IAM permissions for google compute plugin
Week of 2023-10-16
Goals
-
fleeting-plugin-azure
: Add README / configuration options -
fleeting-plugin-azure
: Unit tests -
fleeting-plugin-azure
: Integration tests -
taskscaler
: Implement Acquisition.WithContext
Dedicated SaaS Runners for GitLab Dedicated
Runner Team DRI: Joseph Burnett Slack Channel: #f_hosted_runners
Description: A dedicated runner is a runner that would only be registered to a specific project, group, or instance and not be shared with other users. With this project GitLab will spin up dedicated runner resources within the Dedicated cloud account. The project plan tracking this work can be found here.
Iteration 16.5 (ending 2023-10-19)
- https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/issues/14+s
- https://gitlab.com/gitlab-org/ci-cd/shared-runners/infrastructure/-/issues/87+s
Week of 2023-09-25
- Verify Runner SaaS delivery timeline meets the requirements of Environment Automation. It does: https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/2825#note_1586596202
Week of 2023-10-02
Goals
- Consolidate all work under a single sub-epic. Update issue and epic structure to reflect our agreement to deliver infrastructure-as-a-library (GRIT) and for Environment Automation to operate the runners themselves.
Week of 2023-10-09
Clarified epic and issue structure with Dedicated (thread). New project plan for Runner side to replace previous epic: https://gitlab.com/gitlab-org/ci-cd/shared-runners/infrastructure/-/issues/158 New GRIT sub-epic to track all library work for this use-case: https://gitlab.com/groups/gitlab-org/ci-cd/runner-tools/-/epics/2
Goals
- Create a provisional Docker capable Linux AMI
- Update
dev
environment to support Linux.
Iteration 16.6 (ending 2023-11-17)
Goals
- Complete test template working end-to-end
- Complete prod template working end-to-end
- Unit tests (in progress: https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/merge_requests/6)
- End-to-End test (in review: https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/merge_requests/8)
- Demo and launch video (https://www.youtube.com/watch?v=K_eOuXN-nXM)
Week of 2023-11-20
Completed end-to-end functionality for both the test template and prod template. End-to-end testing has been added and unit tests are in-progress as we decide a reusable approach for all unit test cases. The demo video for GRIT beta prod was recorded, demonstrating the latest state of GRIT using the prod template.
Iteration Goals
- Unit tests (in progress: https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/merge_requests/6)
- Allow users to bring your own VPC (https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/issues/35)
- Update READMEs to match latest changes (https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/issues/36)
Week of 2023-11-27
If you haven't had a chance, please check out the demo video for GRIT beta prod, which demonstrates the latest state of GRIT using the prod template.
Progress was made last week on adding unit tests, and the only blocker appears to be an issue that is breaking e2e tests in both the unit test branch and the master
branch. We will investigate and get the test passing this week.
We will also update READMEs to reflect the recent changes and refactors as well as begin adding granularity to VPC configuration in the prod
module.
Goals
- Add Unit tests (in progress: https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit/-/merge_requests/6)
- Begin issue: Allow users to bring your own VPC
- Fix and improve E2E tests
Week of 2023-12-04
We investigated the failing E2E tests and discovered the issue was caused by changes to log levels in GitLab Runner. The E2E tests were fixed and unit tests merged. I began updating the README and discovered our refactors have broken some configurations that involve GCP. These configurations predated our unit and e2e testing. I added an issue and put up an MR to fix the broken configurations and add test coverage. Our current focus is on AWS for Dedicated Runners, so my aim with the fixes for GCP configs in the README is to cover only those README cases at this time and follow up on more thorough GCP tests when we move our focus to GCP at a later date. We also began discussions on how to customize VPCs and subnets in Dedicated Runners.
Goals
- Fix broken README configurations
- Update READMEs to match latest changes
- Allow users to bring their own VPC
Runner Fleet Dashboard
Runner Team DRI: Vladimir Shushlin Slack Channel: #f_runner_fleet_management
Description: Operators of self-managed runner fleets need, at a glance, observability or, more specifically, the ability to quickly answer critical questions about their Runner Fleet infrastructure. Providing actionable insights in the Runner Fleet Dashboard equips GitLab Runner Fleets operators with the tools they need to ensure that developers in their organization can consistently and efficiently run CI/CD jobs at scale. The answers to questions such as how fast will CI/CD jobs start, are our CI/CD jobs waiting in a queue, are there are performance or other problems with the CI/CD job environment will be readily available in the Runner Fleet Dashboard. The result is improved developer efficiency, reduced costs, and excellent customer experience for the development teams that rely on the CI/CD build infrastructure.
Week of 2023-10-02
Goals
- Enable ClickHouse connection on Staging
- Enable Runner Dashboard on Staging
- Enable CI data ingestion on Staging
- Enable ClickHouse part of the dashboard on Staging
Summary
We got everything working on Staging, but discovered two bugs in data ingestion:
So we disabled data ingestion, and are working on fixing it.
We also reviewed the bigger epic, and decided that nothing blocks us from enabling the dashboard for everyone without ClickHouse part of data.
There are ongoing discussions on how to use this ci analytics architecture in other features.
Week of 2023-10-09
Continuing to focus on issues found last week with the goal of re-enabling dashboard on staging and production.
Goals
- Fix duplication bug
- Fix removed ci_builds bug
- Rollout and remove runners_dashboard feature flag thus enabling Runner Dashboard for everyone
- Re-enable CI data ingestion on Staging and finish it
- Add clickhouse credentials for Production
- Add support for clickhouse in Omnibus - that will be required for tests on self-managed as well as using ClickHouse from rails console Staging/Production
- Fix “always integer” bug
Summary
- We got Runner dashboard enabled on both staging and production including ClickHouse!
- Decided not to release dashboard in 16.5. We want to test it a bit better and prepare a proper release post.
- Omnibus part stuck in review a bit.
- During the data ingestion we discovered that we duplicate some data, it needs investigation
Week of 2023-10-16
With production rollout complete we now will focus on delivering the dashboard(including Clickhouse) to self-managed. We also have half of the usual backend capacity throughout the 16.6 due to planned time off.
Goals
- Create closed-beta self-managed rollout plan
- Finish omnibus support
- Enable dashboard for self-managed and release in 16.6
- (stretch) Get a working PoC for Clickhouse migrations support
Weeks of 2023-10-23 - 2023-11-20
We released the dashboard for self-managed in 16.6. And were working delivering ClickHouse-powered part of the dashboard to self-managed.
Goals
- Write the documentation for setting-up the ClickHouse and the dashboard on self-managed.
- Implement the basic support for ClickHouse migrations
- Debug the duplicates issue on gitlab.com
Week of 2023-11-27
With the basic migrations support in place, we’re now working on running those migrations automatically during the normal GitLab upgrade process for omnibus and gitlab chart.
We’re also working on fixing the duplicates issue.
Goals
- Fix the duplicates issue.
- Run ClickHouse migrations automatically during omnibus upgrade - get it working and in review
- Imlement exclusive lease to prevent parallel execution of migrations
46417d02
)