Pipeline Monitoring

Overview of our monitoring tools and practices

End-to-end test pipelines

The test pipelines run on a scheduled basis, and their results are posted to Slack. The following are the end-to-end test pipelines that are monitored every day.

Environment Links Tests type Frequency Slack channel Latest test report
Production Pipelines | Chatops Smoke after each deployment to Canary and after a feature flag in production has been updated to true or 100% #e2e-run-production Production Sanity
Canary Pipelines | Definition Smoke after each deployment to Canary #e2e-run-production Canary Sanity
- Pipelines | Definition Full after each deployment to Canary #e2e-run-production Canary Full
Staging Pipelines | Definition Smoke after each deployment to Staging-Canary #e2e-run-staging Staging Sanity
- Pipelines | Definition Smoke after the execution of post-deploy migrations in Staging #e2e-run-staging Staging Sanity
Staging Canary Pipelines | Definition Smoke after each deployment to Staging #e2e-run-staging Staging-Canary
CustomersDot Staging Pipelines | Definition Full after each deployment to CustomersDot Staging #e2e-run-staging #s_fulfillment_status CustomersDot Staging
Staging Ref Pipelines | Definition Smoke after each deployment to Staging Ref. #e2e-run-staging-ref Staging Ref Sanity
Preprod Pipelines | Definition Smoke Every month for a few days before release, at 03:00 UTC and after deployment to preprod during Security and Patch releases #e2e-run-preprod Preprod
Release Pipelines | Definition Smoke Every month after the final release and after deployment to Release during Security and Patch releases #e2e-run-release Release
GitLab master e2e:test-on-omnibus-ee Pipelines | Definition Full scheduled pipeline every 2 hours #e2e-run-master Master EE
GitLab master e2e:test-on-omnibus-ce Pipelines | Definition Full Daily at 4:00am UTC #e2e-run-master Master CE
GitLab master e2e:test-on-gdk Pipelines | Definition Full scheduled pipeline every 2 hours #e2e-run-master Master GDK
GitLab master e2e:test-on-cng Pipelines | Definition Smoke, Blocking scheduled pipeline every 2 hours #e2e-run-master Master CNG
GitLab master Nightly Pipelines | Definition Full Daily at 4:00am UTC #e2e-run-master Master Nightly

NOTE: For information on how to investigate failing tests and pipelines, check out Debugging Failing Tests and Test Pipelines

Test metrics

For visibility on the test health, we have test execution results exported to:

Test reports

Allure report

Another tool we have to present the test results is through the Allure test reports. Tests that run on pipelines generate Allure reports. The QA framework uses the Allure RSpec gem to generate source files for the Allure test report. An additional job in the pipeline:

  • Fetches these source files from all test jobs.
  • Generates and uploads the report to the S3 bucket gitlab-qa-allure-report located in AWS group project eng-quality-ops-ci-cd-shared-infra.

Each type of scheduled pipeline generates a static link for the latest test report according to its stage:

Environment Description Link
master (gdk) E2E test execution against gitlab-development-kit environment packaged in a Docker container. Allure test report
master (test-on-omnibus) E2E test execution against various configurations of omnibus images. Allure test report
nightly E2E test execution against various configurations of omnibus nightly images. Allure test report
staging-full E2E test execution against https://staging.gitlab.com environment. Allure test report
staging-sanity E2E test execution against various configurations of omnibus nightly images. Allure test report
staging-ref-full E2E test execution against https://staging-ref.gitlab.com environment. Allure test report
staging-ref-sanity E2E test execution against https://staging-ref.gitlab.com environment. Allure test report
preprod E2E test execution against https://pre.gitlab.com environment. Allure test report
production-full E2E test execution against https://gitlab.com environment. Allure test report
production-sanity E2E test execution against https://gitlab.com environment. Allure test report

These reports are also included in the pipeline status alerts on Slack.

Test session issue

For each end-to-end pipeline that runs in the various environments we automatically test, we create a test session issue that contains the test session information. Test session issues group test results by DevOps stages, and link to test cases, and test failure issues.

Example of a test session issue: https://gitlab.com/gitlab-org/quality/testcase-sessions/-/issues/72516

Test session issues are a workaround for a missing GitLab feature. Once GitLab stores test data, we can improve failure reporting and management.

Test result issue

Each test is associated to a GitLab testcase.

  RSpec.describe 'Stage' do
    describe 'General description of the feature under test' do
      it 'test name', testcase: 'https://gitlab.com/gitlab-org/gitlab/-/quality/test_cases/:test_case_id' do
        ...
      end

      it 'another test', testcase: 'https://gitlab.com/gitlab-org/gitlab/-/quality/test_cases/:another_test_case_id' do
        ...
      end
    end
  end

The test failure stack trace and the issue stack trace are compared, and the existing issue for which the stack trace is the most similar (under a 15% difference threshold) to the test failure is used. The test failure job is then added to the failure report list in the issue. Group label is automatically inferred based on the product_group metadata of the test.