Automated Reporting of the most flaky tests
Overview
This document outlines the automated process for identifying, tracking, and reporting flaky tests in gitlab-org/gitlab & gitlab-org/gitlab-foss projects. The system runs weekly to detect the most problematic flaky tests and creates issues for investigation by development teams.
How It Works
Schedule
- Frequency: Weekly on Sundays at 01:00 UTC
- Pipeline Schedule:
create-update-issues-for-top-flaky-tests
(internal)
Process Overview
1. Update Most Flaky Tests Job (update-most-flaky-tests
)
This job maintains existing flaky test issues by:
- Searching for open issues with the
automation:top-flaky-test
label in thegitlab-org/gitlab
project - Querying Snowflake for new failures since each issue was created
- Adding comments with details about new failures to existing issues
2. Report Most Flaky Tests Job (report-most-flaky-tests
)
This job identifies new flaky tests by:
- Querying Snowflake for RSpec test failure data from:
gitlab-org/gitlab
gitlab-org/gitlab-foss
- Filtering out failures related to:
- Master broken incidents
- Valid test failures
- Identifying the top 50 most flaky tests based on pipeline impact
- Creating issues only for tests that don’t already have existing issues
Note: The filtering logic is not 100% accurate due to the difficulty of reliably identifying all master broken incidents related failures and valid failures. Manual review is required as described below in Manual Follow-Up Steps.
Issue Creation Details
Each automatically created issue includes:
Labels:
automation:top-flaky-test
automation:bot-authored
backlog::prospective
type::maintenance
Content:
- Comprehensive description with test-specific details
- List of all failures from the past week
- Debugging guidance and data collection instructions
- Validation steps for confirming when flakiness is fixed
Manual Follow-Up Steps
After issues are automatically created:
- Review: A development analytics engineer reviews all generated issues
- Validation: Confirms that issues highlight actual flaky tests (not false positives)
- Tracking: Adds
flaky-test-reviewed
label to mark the issue as reviewed - Assignment: Assigns validated issues to the appropriate stage groups for investigation and resolution
Triage using GitLab Duo Agent
The ci-alerts
project now includes a duo chat template (internal) to triage and generate a structured report for flaky tests. Follow the steps below to triage an issue created for a flaky tests using duo:
Locally in Code editor
Open the ci-alerts
project locally and ask duo agentic chat the following questions
- “Triage this flaky test issue <issue_url> using
flaky-test-triage-report-template.md
” to generate the report. - If satisfied with the report details, prompt duo agent “Post as a comment” to post the report as a comment on the issue.
On browser
Open the duo chat template on the browser, select the required duo agentic chat model (claude/chatgpt) and ask the same questions as above.
Links
Data Source
All test failure data is sourced from Snowflake queries that analyze historical pipeline failures and identify patterns indicating flaky behavior.
This process is maintained by the Development Analytics team. For questions or issues with the automation, please contact the team in #g_development_analytics or create an issue in ci-alerts (internal) project.
c333a9ac
)