Rotation Leaders

Welcome to the Rotation Leader role for the DevOps Tier 2 On-Call program. As a Rotation Leader, you are responsible for the health, fairness, and effectiveness of your team’s on-call rotation. This guide outlines your core responsibilities and how to execute them.

Your Role and Responsibilities

Rotation Leaders are expected to:

Managing the Schedule

Building and Maintaining the Rotation

While general guidance is provided, you are responsible for the overall structure and composition of the rotation:

  • Target 8 people per region (APAC, EMEA, AMER) for balanced workload and flexibility, but minimally 6
  • Maximum 12 people before reassessing team structure
  • This means engineers will be on call one week for every 6-12 weeks, or between 22-43 days of the year
  • Publish schedules at least one month in advance so team members can plan

The Schedule

To publish, manage, or view the schedule which includes AMER, EMEA, and APAC DevOps Rotations within it:

Coverage Hours

See coverage expectations here.

Public Holidays

See here.

Regular Reviews

Every quarter, conduct a review of:

  • Do we need more Subject Matter Experts?
  • How many times was each person on-call? Was anyone on-call more than once every 4 weeks? How fairly were shifts distributed?
  • How many times are team members paged during a shift?
  • Did anyone burn out or report unsustainable load?
  • Did you meet your coverage and response time goals for each region?
  • Are there services generating too few pages (potential coverage gaps) or exessively (alert fatigue)?
  • Time to Fix: How long from declaration to resolution?
  • Are there patterns around what gets escalated, from who, and why?
  • Are escalations going to the right teams on first try (target 90%+)?

Onboarding New Team Members

Adding Someone to the Rotation

See: Getting added to a rotation.

Required Onboarding Resources

Ensure new team members have access to and understand the 1st shift information.

Iterate

As escalations come in, identify gaps in documentation:

  • When Tier 2 engineers escalate, ask what information would have helped them resolve it faster
  • Use post-incident retrospectives to identify runbook gaps
  • Prioritize creating or updating runbooks for frequent escalation patterns
  • Monitor whether incidents reference runbooks
  • Identify runbooks that aren’t being used and update or remove them
  • Create new runbooks based on escalation patterns
  • Aim to cover 80% of common incidents

For S1/S2 incidents (or significant S3/S4 incidents):

  • Ensure 100% of escalated S1/S2 incidents have a formal retrospective or write-up
  • Lead retrospectives in a blameless manner, focusing on system improvements
  • Document what was learned and what can be improved
  • Track action items and follow up on completion

Quick Reference: Key Responsibilities

Schedule Management:

  • Maintain 6-12 people in rotation
  • Publish schedules 1+ month in advance
  • Track effectiveness quarterly
  • Cap individual rotation frequency at once per 4 weeks maximum

Onboarding:

  • Add new members to Incident.io
  • Provide tool access and documentation
  • Support first shifts
  • Ensure training completion

Workload Tracking:

  • Monitor pages per shift
  • Track incident response metrics
  • Watch for burnout indicators
  • Conduct quarterly effectiveness reviews

Team Support:

  • Help with schedule conflicts and swaps
  • Create overrides for absences
  • Address unsustainable load
  • Support escalations during incidents

Improvement:

  • Identify runbook gaps
  • Lead blameless retrospectives
  • Track metrics and trends
  • Share learnings with the team
Last modified November 27, 2025: Remove trailing spaces (14d41894)