Development Department Learning and Development - Reliability
Goal of this training
As we have a renewed focus on reliability in engineering to reduce outages, we have made many changes to the handbook, production documentation, and our processes. While we have announced them via multimodal communication (EWIR, slack, email, meetings), not everyone has likely seen and internalized all of the important changes.
We want to gather all the crucial changes, explain why we made them, discuss a summary, and link to where you can find more information.
This material is available as a learning pathway on GitLab’s Level Up.
Introduction
Amplifying SaaS Reliability Focus
Reliability & Security Standup
The business impact of reliability
Importance of reliability to the business
Impact of reliability on users
Improving SUS - slides 9 through 14 in particular
Updates to values
MR to change quality and reliability
MR around things that don’t scale
Blameless Culture
Google SRE Book: Blameless culture
Limiting the impact of far reaching work
Limiting the impact of far-reaching work
Overview of Risk Mapping
MR acceptance checklist
Updates to the definition of done
Backwards Compatibility
Course on backwards compatibility
How to use the stage group dashboards to understand how a feature category performs
Stage group dashboard documentation
Error budgets
Feature Change Locks (FCL)
Added past due infradev as a KPI
Overview of Engineering Metrics Dashboards
Feedback on the training
- What did you like about the training?
- What did you not like that we should improve?
Add your comments in this feedback issue.
d748cf8c
)