Automated Reporting of the most flaky tests

Overview

This document outlines the automated process for identifying, tracking, and reporting flaky tests in gitlab-org/gitlab & gitlab-org/gitlab-foss projects. The system runs weekly to detect the most problematic flaky tests and creates issues for investigation by development teams.

How It Works

Schedule

Frequency: Weekly on Sundays at 01:00 UTC
Pipeline Schedule: create-update-issues-for-top-flaky-tests (internal)

Process Overview

1. Update Most Flaky Tests Job (`update-most-flaky-tests`)

This job maintains existing flaky test issues by:

Searching for open issues with the automation:top-flaky-test label in the gitlab-org/gitlab project
Querying Snowflake for new failures since each issue was created
Adding comments with details about new failures to existing issues

2. Report Most Flaky Tests Job (`report-most-flaky-tests`)

This job identifies new flaky tests by:

Querying Snowflake for RSpec test failure data from:
- gitlab-org/gitlab
- gitlab-org/gitlab-foss
Filtering out failures related to:
- Master broken incidents
- Valid test failures
Identifying the top 50 most flaky tests based on pipeline impact
Creating issues only for tests that don’t already have existing issues

Note: The filtering logic is not 100% accurate due to the difficulty of reliably identifying all master broken incidents related failures and valid failures. Manual review is required as described below in Manual Follow-Up Steps.

Issue Creation Details

Each automatically created issue includes:

Labels:

automation:top-flaky-test
automation:bot-authored
backlog::prospective
type::maintenance

Content:

Comprehensive description with test-specific details
List of all failures from the past week
Debugging guidance and data collection instructions
Validation steps for confirming when flakiness is fixed

Manual Follow-Up Steps

After issues are automatically created:

Review: A development analytics engineer reviews all generated issues
Validation: Confirms that issues highlight actual flaky tests (not false positives)
Tracking: Adds flaky-test-reviewed label to mark the issue as reviewed
Assignment: Assigns validated issues to the appropriate stage groups for investigation and resolution

Triage using GitLab Duo Agent

The ci-alerts project now includes a duo chat template (internal) to triage and generate a structured report for flaky tests. Follow the steps below to triage an issue created for a flaky tests using duo:

Video (internal)

Locally in Code editor

Open the ci-alerts project locally and ask duo agentic chat the following questions

“Triage this flaky test issue <issue_url> using flaky-test-triage-report-template.md” to generate the report.
If satisfied with the report details, prompt duo agent “Post as a comment” to post the report as a comment on the issue.

On browser

Open the duo chat template on the browser, select the required duo agentic chat model (claude/chatgpt) and ask the same questions as above.

Data Source

All test failure data is sourced from Snowflake queries that analyze historical pipeline failures and identify patterns indicating flaky behavior.

This process is maintained by the Development Analytics team. For questions or issues with the automation, please contact the team in #g_development_analytics or create an issue in ci-alerts (internal) project.

Last modified October 13, 2025: Update documentation for triaging Flaky tests in web browser (c333a9ac)

View page source - Edit this page - please contribute.