Automated Reporting of the most flaky specs
Overview
This document outlines the automated process for identifying, tracking, and reporting most flaky specs for gitlab-org/gitlab & gitlab-org/gitlab-foss projects. The system runs weekly to detect the most problematic flaky specs and creates issues for investigation by development teams.
How It Works
Schedule
- Frequency: Weekly on Sundays at 10:00 UTC
- Pipeline Schedule:
create-update-issues-for-top-flaky-tests(internal)
Process Overview
1. Update Most Flaky Specs Job (update-most-flaky-specs)
This job maintains existing flaky spec issues by:
- Searching for open issues with the
automation:top-flaky-speclabel in gitlab-org/quality/test-failure-issues (internal) project - Querying Snowflake for new failures since each issue was created
- Adding comments with details about new failures to existing issues
2. Report Most Flaky Specs Job (report-most-flaky-specs)
This job identifies new flaky specs by:
- Querying Snowflake for RSpec specs failure data for:
gitlab-org/gitlabgitlab-org/gitlab-foss
- Filtering out failures related to:
- Master broken incidents
- Valid failures
- Identifying the top 10 most flaky specs based on pipeline impact
- Creating issues only for specs that don’t already have existing issues
Note: The filtering logic is not 100% accurate due to the difficulty of reliably identifying all master broken incidents related failures and valid failures. Manual review is required as described below in Manual Follow-Up Steps.
Issue Creation Details
Each automatically created issue includes:
Labels:
automation:top-flaky-specautomation:bot-authoredbacklog::prospectivetype::maintenance
Content:
- Comprehensive description with test-specific details
- List of failures for up to 10 tests within a spec from the past week
- Debugging guidance and data collection instructions
- Validation steps for confirming when flakiness is fixed
Manual Follow-Up Steps
After issues are automatically created:
- Review: A development analytics engineer reviews all generated issues
- Validation: Confirms that issues highlight actual flaky specs (not false positives)
- Tracking: Adds
flaky-test-reviewedlabel to mark the issue as reviewed - Assignment: Assigns validated issues to the appropriate stage groups for investigation and resolution
Triage using GitLab Duo Agent
The ci-alerts project now includes a duo chat template (internal) to triage and generate a structured report for flaky specs. Follow the steps below to triage an issue created for a flaky specs using duo:
Locally in Code editor
Open the ci-alerts project locally and ask duo agentic chat the following questions
- “Triage this flaky test issue <issue_url> using
flaky-test-triage-report-template.md” to generate the report. - If satisfied with the report details, prompt duo agent “Post as a comment” to post the report as a comment on the issue.
On browser
Open the duo chat template on the browser, select the required duo agentic chat model (claude/chatgpt) and ask the same questions as above.
Links
- Reviewed open issues for the most flaky specs (internal)
- All open issues for the most flaky specs (internal)
Legacy Links (Used to report failures per test, not per spec)
- Reviewed open issues for the most flaky tests (internal)
- All open issues for the most flaky tests (internal)
Data Source
All test failure data is sourced from Snowflake queries that analyze historical pipeline failures and identify patterns indicating flaky behavior.
This process is maintained by the Development Analytics team. For questions or issues with the automation, please contact the team in #g_development_analytics or create an issue in ci-alerts (internal) project.
38f90788)
