Runner Group - Risk Map
The goal of this page is to document a general risk map for the Runner group.
Overview
The goal of this page is to create, share and iterate the Risk Map for the Runner team.
Goals
Utilise the Risk Map as a tool to:
- Understand the risks the team faces
- Increase transparency on mitigation plans
- Effectively allocate limited resources
- Collaborate strategically in improving Quality
General Risk Map
Map key
- Impact - what happens if the risk is not mitigated or eliminated
- Impact level - Rate 1 (LOW) to 5 (HIGH)
- Probability - Rate 1 (LOW) to 5 (HIGH)
- Priority - Impact x Probability. Address highest score first.
- Mitigation - what could be done to lower the impact or probability |
Risk Area | Risk Description | Impact | Impact Level | Probability | Priority | Mitigation |
---|---|---|---|---|---|---|
Team/Stability | Burn out | Low productivity and attrition | Minimise overloading and blockers | |||
Team/Scaling | Inefficient team member onboarding | Prolonged low productivity | Clear onboarding guidance and prioritisation | |||
Team/Expertise | Concentration of knowledge | Documenting process and knowledge | ||||
Quality/Coverage | Uncertain test coverage | Escaped bugs | Test coverage analysis and coverage automation | |||
Quality/Coverage | Sufficient test coverage exercising binaries across supported compute architectures and OSes | Escaped bugs, damage to reputation by inability to claim support due to lack of test coverage | Integration-level test environment and respective test framework | |||
Quality/Coverage | Sufficient test coverage exercising released images | Escaped bugs, damage to reputation due to inability to claim releases were tested | Integration-level test environment and respective test framework | |||
Quality/Infrastructure | Ability to effectively test at release | Escaped bugs | Reference platforms and standard test harness | |||
Feature/Dependencies | Bugs in third party dependencies | Bugs triage, escaped bugs, Failure to execute pipelines | Sufficient test coverage against latest supported version | |||
Feature/Compatibility | Changes in third party dependencies | Bugs triage, escaped bugs, Failure to execute pipelines | Testing against multiple dependency versions | |||
Feature/Function | Functional requirements not met for teams at scale | Low customer satisfaction for key customers | ||||
Team/Workload | Toil work | Small tasks that should take a few minutes take hours, putting a backlog on reviews/deliverables | ||||
Team/Scaling | Slow pipelines | Take a long time to get feedback on a pipeline and for maintainers to merge something | ||||
Feature/Delivery | Technical debt | When we have so much technical debt it’s hard to deliver a feature on time. | ||||
Feature/Delivery | Slow deployment process | Context switching on a feature that you merged weeks ago | ||||
Feature/Dependencies | Not updating 3rd party code | Result in bugs or slow feature delivery since we have to update a bunch of dependencies first | ||||
Observability | Lack of standard on logging | When debugging a problem in production it can be hard to shift through logs if you can’t see what is going on in the application | ||||
Observability | Metrics/Dashboards | Hard to debug and understand what is going with production | ||||
Feature/Compatibility | Insufficient testing on supported cloud platforms | Inability to claim compatibility and feature failures on customer runners | Integration-level test environment and framework |
Last modified June 27, 2024: Fix various vale errors (
46417d02
)