Runner Group - Risk Map

The goal of this page is to document a general risk map for the Runner group.


The goal of this page is to create, share and iterate the Risk Map for the Runner team.


Utilise the Risk Map as a tool to:

  • Understand the risks the team faces
  • Increase transparency on mitigation plans
  • Effectively allocate limited resources
  • Collaborate strategically in improving Quality

General Risk Map

Map key

  • Impact - what happens if the risk is not mitigated or eliminated
  • Impact level - Rate 1 (LOW) to 5 (HIGH)
  • Probability - Rate 1 (LOW) to 5 (HIGH)
  • Priority - Impact x Probability. Address highest score first.
  • Mitigation - what could be done to lower the impact or probability |
Risk Area Risk Description Impact Impact Level Probability Priority Mitigation
Team/Stability Burn out Low productivity and attrition Minimise overloading and blockers
Team/Scaling Inefficient team member onboarding Prolonged low productivity Clear onboarding guidance and prioritisation
Team/Expertise Concentration of knowledge Documenting process and knowledge
Quality/Coverage Uncertain test coverage Escaped bugs Test coverage analysis and coverage automation
Quality/Coverage Sufficient test coverage exercising binaries across supported compute architectures and OSes Escaped bugs, damage to reputation by inability to claim support due to lack of test coverage Integration-level test environment and respective test framework
Quality/Coverage Sufficient test coverage exercising released images Escaped bugs, damage to reputation due to inability to claim releases were tested Integration-level test environment and respective test framework
Quality/Infrastructure Ability to effectively test at release Escaped bugs Reference platforms and standard test harness
Feature/Dependencies Bugs in third party dependencies Bugs triage, escaped bugs, Failure to execute pipelines Sufficient test coverage against latest supported version
Feature/Compatibility Changes in third party dependencies Bugs triage, escaped bugs, Failure to execute pipelines Testing against multiple dependency versions
Feature/Function Functional requirements not met for teams at scale Low customer satisfaction for key customers
Team/Workload Toil work Small tasks that should take a few minutes take hours, putting a backlog on reviews/deliverables
Team/Scaling Slow pipelines Take a long time to get feedback on a pipeline and for maintainers to merge something
Feature/Delivery Technical debt When we have so much technical debt it’s hard to deliver a feature on time.
Feature/Delivery Slow deployment process Context switching on a feature that you merged weeks ago
Feature/Dependencies Not updating 3rd party code Result in bugs or slow feature delivery since we have to update a bunch of dependencies first
Observability Lack of standard on logging When debugging a problem in production it can be hard to shift through logs if you can’t see what is going on in the application
Observability Metrics/Dashboards Hard to debug and understand what is going with production
Feature/Compatibility Insufficient testing on supported cloud platforms Inability to claim compatibility and feature failures on customer runners Integration-level test environment and framework
Last modified June 27, 2024: Fix various vale errors (46417d02)