Engineering Productivity team
Mission
- Constantly improve efficiency for our entire engineering team, to ultimately increase value for our customer.
- Measure what matters: quality of life, efficiency, and toil reduction improvements with quantitative and qualitative measures.
- Build partnerships across organizational boundaries to deliver broad efficiency improvements.
Team
Members
Team Members | Role |
---|---|
Ethan Guo | Acting Engineering Manager |
Alina Mihaila | Senior Backend Engineer, Engineering Productivity |
David Dieulivol | Senior Backend Engineer, Engineering Productivity |
Jennifer Li | Senior Backend Engineer, Engineering Productivity |
Jen-Shin Lin | Senior Backend Engineer, Engineering Productivity |
Nao Hashizume | Backend Engineer, Engineering Productivity |
Peter Leitzen | Staff Backend Engineer, Engineering Productivity |
Rémy Coutable | Principal Engineer, Infrastructure |
Stable Counterpart
Person | Role |
---|---|
Greg Alfaro | GDK Project Stable Counterpart, Application Security |
Core Responsibilities
graph LR A[Engineering Productivity Team] A --> B[Planning & Reporting] B --> B1[Weekly team reports<br>Providing teams with an overview of their current, planned & unplanned work] click B1 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/32" B --> B2[Issues & MRs hygiene automation<br>Ensuring healthy issue/MR trackers] click B2 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/32" A --> C[Development Tools] C --> C1[GitLab Development Kit<br>Providing a reliable development environment] click C1 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/31" C --> C2[GitLab Remote Development<br>Providing a remote reliable development environment] click C1 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/31" A --> F[Review & CI] F --> F2[Merge Request Review Process<br>Ensuring a smooth, fast and reliable review process] click F2 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/34" F --> F3[Merge Request Pipelines<br>Providing fast and reliable pipelines] click F3 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/28" F --> F4[Review apps<br>Providing review apps to explore a merge request changes] click F4 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/33" A --> D[Maintenance & Security] D --> D1[Automated dependency updates<br>Ensuring dependencies are up-to-date] click D1 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/40" D --> D2[Automated management of CI/CD secrets<br>Providing a secure CI/CD environment] click D2 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/46" D --> D3[Automated main branch failing pipelines management<br>Providing a stable `master` branch] click D3 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/30" D --> D4[Static analysis<br>Ensuring the codebase style and quality is consistent and reducing bikeshedding] click D4 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/38" D --> D5[Shared CI/CD components<br>Providing CI/CD components to ensure consistency in all GitLab projects] click D5 "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/41" A --> G[JiHu Support] click G "https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/35"
- See it and find it: Build automated measurements and dashboards to gain insights into the productivity of the Engineering organization to identify opportunities for improvement.
- Implement new measurements to provide visibility into improvement opportunities.
- Collaborate with other Engineering teams to provide visualizations for measurement objectives.
- Improve existing performance indicators.
- Do it for internal team: Increase contributor and developer productivity by making measurement-driven improvements to the development tools / workflow / processes, then monitor the results, and iterate.
- Identify and implement quantifiable improvement opportunities with proposals and hypothesis for metric improvements.
- Automated merge request quality checks and code quality checks.
- GitLab project pipeline improvements to improve efficiency, quality or duration.
- Dogfood use: Dogfood GitLab product features to improve developer workflow and provide feedback to product teams.
- Use new features from related product groups (Analytics, Monitor, Testing).
- Improve usage of Review apps for GitLab development and testing.
- Engineering support:
#master-broken
pipeline monitoring.- KPI corrective actions such as Review Apps stabilization.
- Merge Request Coach for ~“Community contribution” merge requests.
- Engineering workflow: Develop automated processes for improving label classification hygiene in support of product and Engineering workflows.
- Automated issues and merge requests triage.
- Improvements to the labelling classification and automation used to support Engineering measurements.
- See the
gitlab-triage
Ruby gem, and Triage operations projects for examples.
- Do it for wider community: Increase efficiency for wider GitLab Community contributions.
- Dogfood build: Enhance and add new features to the GitLab product to improve engineer productivity.
Metrics
KPIs
Infrastructure Performance Indicators are our single source of truth
PIs
Shared
- Quality Handbook MR Rate
- Quality Department Promotion Rate
- Quality Department Discretionary Bonus Rate
Dashboards
The Engineering Productivity team creates metrics in the following sources to aid in operational reporting.
- Engineering Productivity Collection
- Broken Master Pipeline Root Cause Analysis
- Time to First Failure
- Flaky test issues
- Test Intelligence Accuracy
- Engineering Productivity Pipeline Durations
- Engineering Productivity Jobs Durations
- Engineering Productivity Package And QA Durations (to be replaced in Tableau)
- GDK - Jobs Durations (to be replaced in Tableau)
- Issue Types Detail
- GitLab-Org Native Insights
- Review Apps monitoring dashboard
- Triage Reactive monitoring dashboards
OKRs
Objectives and Key Results (OKRs) help align our sub-department towards what really matters. These happen quarterly and are based on company OKRs. We follow the OKR process defined here.
Here is an overview of our current OKRs.
Communication
Description | Link |
---|---|
GitLab Team Handle | @gl-quality/eng-prod |
Slack Channel | #g_engineering_productivity |
Team Boards | Team Board & Priority Board |
Issue Tracker | gitlab-org/quality/engineering-productivity/team |
Office hours
Engineering productivity has monthly office hours on the 3rd Wednesday of the month at 3:00 UTC (20:00 PST) on even months (e.g February, April, etc) open for anyone to add topics or questions to the agenda. Office hours can be found in the GitLab Team Meetings calendar
Meetings
Engineering Productivity has weekly team meeting in two parts (EMEA / AMER) to allow for all team members to collaborate in times that work for them.
- Part 1 is Tuesdays 11:00 UTC, 04:00 PST
- Part 2 is Tuesdays 22:00 UTC, 15:00 PST
Communication guidelines
The Engineering Productivity team will make changes which can create notification spikes or new behavior for GitLab contributors. The team will follow these guidelines in the spirit of GitLab’s Internal Communication Guidelines.
Pipeline changes
Critical pipeline changes
Pipeline changes that have the potential to have an impact on the GitLab.com infrastructure should follow the Change Management process.
Pipeline changes that meet the following criteria must follow the Criticality 3 process:
- update to the
cache-repo
job job
These kind of changes led to production issues in the past.
Non-critical pipeline changes
The team will communicate significant pipeline changes to #development
in Slack and the Engineering Week in Review.
Pipeline changes that meet the following criteria will be communicated:
- addition, removal, renaming, parallelization of jobs
- changes to the conditions to run jobs
- changes to pipeline DAG structure
Other pipeline changes will be communicated based on the team’s discretion.
Automated triage policies
Be sure to give a heads-up to #development
,#eng-managers
,#product
, #ux
Slack channels
and the Engineering week in review when an automation is expected to triage more
than 50 notifications or change policies that a large stakeholder group use (e.g. team-triage report).
Experiments
This is a list of Engineering Productivity experiments where we identify an opportunity, form a hypothesis and experiment to test the hypothesis.
Experiment | Status | Hypothesis | Feedback Issue or Findings |
---|---|---|---|
Automatic issue creation for test failures | Complete | The goal is to track each failing test in master with an issue, so that we can later automatically quarantine tests. |
Feedback issue. |
Always run predictive jobs for fork pipelines | Complete | The goal is to reduce the compute minutes consumed by fork pipelines. The “full” jobs only run for canonical pipelines (i.e. pipelines started by a member of the project) once the MR is approved. | |
Retry failed specs in a new process after the initial run | Complete | Given that a lot of flaky tests are unreliable due to previous test which are affecting the global state, retrying only the failing specs in a new RSpec process should result in a better overall success rate. | Results show that this is useful. |
Experiment with automatically skipping identified flaky tests | Complete - Reverted | Skipping flaky tests should reduce the number of false broken master and increase the master success rate. |
We found out that it can actually break master in some cases, so we reverted the experiment with gitlab-org/gitlab!111217 . |
Experiment with running previously failed tests early | Complete | We have not noticed a significant improvement in feedback time due to other factors impacting our Time to First Failure metric. | |
Store/retrieve tests metadata in/from pages instead of artifacts | Complete | We’re only interested in the latest state of these files, so using Pages makes sense here. This simplifies the logic to retrieve the reports and reduce the load on GitLab.com’s infrastructure. | This has been enabled since 2022-11-09. |
Reduce pipeline cost by reducing number of rspec tests before MR approval | Complete | Reduce the CI cost for GitLab pipelines by running the most applicable rspec tests for changes prior to approval | Improvements needed to identify and resolve selective test gaps as this impacted pipeline stability. |
Enabling developers to run failed specs locally | Complete | Enabling developers to run failed specs locally will lead to less pipelines per merge request and improved productivity from being able to fix regressions more quickly | Feedback issue. |
Use dynamic analysis to streamline test execution | Complete | Dynamic analysis can reduce the amount of specs that are needed for MR pipelines without causing significant disruption to master stability | Miss rate of 10% would cause a large impact to master stability. Look to leverage dynamic mapping with local developer tooling. Added documentation from the experiment. |
Using timezone for Reviewer Roulette suggestions | Complete - Reverted | Using timezone in Reviewer Roulette suggestions will lead to a reduction in the mean time to merge | Reviewer Burden was inconsistently applied and specific reviewers were getting too many reviews compared to others. More details in the experiment issue and feedback issue |
Engineering productivity project management
Flaky tests management and processes
Issue Triage
Test Intelligence
Triage Operations
Wider Community Merge Request Triage
Workflow Automation
7c31cb18
)