Measuring Success

How do we know if the Tier 2 on-call program is working? We measure it through specific metrics that reflect both operational excellence and engineer well-being. This page explains what we track and why it matters.

Core Success Metrics

  1. Reduction in time to resolve: The primary purpose behind expanding to Tier 2 is to provide Subject Matter Expertise to engineers on call in order to solve incidents faster. This is a primary metric in our overall incident response when it comes to Tier 2.
  2. Escalation accuracy: 90%+ of escalations go to the correct team on first try because of the usability in our error messages, stack trace, observability categorization, etc
  3. Zero pages to Tier 2 because of the resiliency of our system, and/or the effectiveness of our runbooks
  4. No escalations past Tier 2 because we always respond in < 15 minutes
  5. Sustainable on call schedules: Engineers are not on call more than 1 week per month
Last modified November 17, 2025: Condense Tier 2 measurements to Top 5 (188cd333)