Distribution Team Infrastructure and Maintenance - Build Machines
Common links
Build Machines
GitLab CI runner manager is responsible for creating build machines for package builds. This node configuration is managed by cookbook-gitlab-runner. Configuration values are stored in the vault named the same as the node, see example.
Currently, the version of GitLab CI runner is locked. We aim to be close to the current version of runner in order to get the fixes that we need without getting into issues that could cause a failure. These failures could prevent the release from going out so be careful with unnecessary changes on these nodes.
Runner manager machines
Distribution team maintains 2 runner manager machines for running different
types of pipelines. Both these machines are in GCP project
omnibus-build-runners
.
- build-runners.gitlab.org:
- build-trigger-runner-manager.gitlab.org
build-runners-gitlab-org
This runner manager manages the machines used for building and publishing
official GitLab CE and EE packages. It is locked to the omnibus-gitlab
and cookbooks/gitlab-omnibus-builder
projects in dev.gitlab.org.
Its configuration can be found in the private gitlab-com/gl-infra/chef-repo
project.
It spins up the following types of machines:
-
x86_64
machines for building packages. They aren1-highcpu-32
machines with 80GB SSD disks, spawned inside GCP usinggoogle
docker-machine driver. -
arm64-builder-dev-gitlab
for building Docker images for ARM and RPi. They arem6g.2xlarge
machines with 80GB solid state disks spawned inside AWS using theamazonec2
docker-machine driver. This is used in https://dev.gitlab.org/cookbooks/gitlab-omnibus-builder. -
arm64-runners-manager-dev-gitlab
for building ARM and RPi packages. They arem6g.2xlarge
machines with 80GB solid state disks spawned inside AWS using theamazonec2
docker-machine driver. -
package-promotion
machines for uploading packages. Since they are only used to upload packages, they are scaled down to save costs. They aren2d-standard-2
machines, spawned inside GCP usinggoogle
docker-machine driver.
build-trigger-runner-manager-gitlab-org
This runner manager manages the machines used for building packages as part of triggered pipeline used by developers to test their changes.
Its configuration can be found in the private gitlab-com/gl-infra/chef-repo
project.
It spins up the following types of machines:
-
x84_64
machines for building packages in thegitlab-org/omnibus-gitlab-mirror
project. They aren1-highcpu-32
machines with 80GB SSD disks, spawned inside GCP usinggoogle
docker-machine driver. -
ARM64 machines for buidling arm64 and Raspberry Pi builder images. They are
m6g.2xlarge
machines with 80GB solid state disks spawned inside AWS using theamazonec2
docker-machine driver. -
qa-builder
machines for running end-to-end tests in thegitlab-org/gitlab-qa
andgitlab-org/gitlab-qa-mirror
projects. They aren2d-standard-2
machines with 50GB disks, spawned inside GCP usinggoogle
docker-machine driver.
Maintenance tasks
Requirements:
- Access to the node
- Access to merge into master on the ops chef repo
- Some tasks need access to a Chef Vault admin. At minimum, contact the Engineering Manager, Distribution for help.
Changing version of GitLab CI runner
To be performed by any team member:
-
Create a new merge request on the chef repo that updates the runner version
-
Ensure the CI pass, and the MR is reviewed by another team member
-
Merge the change into the chef repo
-
Login to the node and run,
1
sudo /root/runner_upgrade.sh
to perform the upgrade. This will stop the chef-client service, stop the runner and cleanup the machines, run the chef-client to fetch the new version and finally, start GitLab Runner again.
When builds are pending on dev.gitlab.org
The common reason for builds to be pending on dev.gitlab.org project is that the number of failed machines is high. Failed machines prevent the runner manager from starting up new machines and this can slow down or even block the release. To resolve this, we need to clean those failed machines. The steps to do this are:
-
Login to the build machine node
-
Enter the root session:
sudo su
. This is required becausedocker-machine
command will list running machines for currently active user -
Run
docker-machine ls
. This will print out the list of machines that are either inRunning
,Error
or have an empty state. -
To list only machines in
Error
state, you can use1
/root/machines_operations.sh list-failing
-
To safely clean the machines with
Error
state, run1
/root/machines_operations.sh remove-failing
-
If the machine has an empty state, you can always remove the machine manually. Running
1
docker-machine ls | grep -v 'Running' | awk '{print $1}' | xargs docker-machine rm --force
will remove all machines that do not have
Running
state.
38be80a8
)