Measuring Success

How do we know if the Tier 2 on-call program is working? We measure it through specific metrics that reflect both operational excellence and engineer well-being. This page explains what we track and why it matters.

Core Success Metrics

Reduction in time to resolve: The primary purpose behind expanding to Tier 2 is to provide Subject Matter Expertise to engineers on call in order to solve incidents faster. This is a primary metric in our overall incident response when it comes to Tier 2.
Escalation accuracy: 90%+ of escalations go to the correct team on first try because of the usability in our error messages, stack trace, observability categorization, etc
Zero pages to Tier 2 because of the resiliency of our system, and/or the effectiveness of our runbooks
No escalations past Tier 2 because we always respond in < 15 minutes
Sustainable on call schedules: Engineers are not on call more than 1 week per month

DevOps Rotation Leader — Rotation leaders track these metrics
Communication and Culture — Blameless culture supports these goals
Joining and Leaving the Rotation — Understand fairness metrics in your rotation

Last modified November 17, 2025: Condense Tier 2 measurements to Top 5 (188cd333)

View page source - Edit this page - please contribute.

Measuring Success

Core Success Metrics

Related Pages