Handoffs and Continuity

The end of your on-call shift is as important as the beginning. A good handoff means the next engineer is set up to succeed.

What is a Handoff?

A handoff is a formal transfer of on-call responsibility from you to the next person.

For the SSCS 24x5 rotation, handoffs happen:

Every 8 hours as coverage moves between regions (APAC → EMEA → AMER → APAC)
At any time if an incident is ongoing when your shift ends

Why Handoffs Matter

Poor handoffs cause:

Lost context and confusion
Issues being forgotten
Repeated work
Missed critical details
Stressed out engineers

Good handoffs:

Give the next person what they need to succeed
Prevent duplication of effort
Ensure nothing falls through the cracks
Build trust in the rotation
Maintain continuity across timezones

What to Include in a Handoff

1. Current Incidents (Active Issues)

If you have any ongoing issues:

Details needed:

What is the issue? — Clear description
What domain? — Authentication, Authorization, or Pipeline Security
What’s the impact? — How many users/services affected?
What have you tried? — What investigation steps?
Where are you stuck? — What’s the blocker?
What should they do next? — Your recommendation

Example:

“High authentication failure rate on SAML login. I restarted the authentication service twice; failures dropped to normal, but came back up within 30 minutes. Latest deployment was 4 hours ago. I think there’s a session state issue in the new SAML handler. Look at the deployment details and consider rolling back. If it gets worse, escalate to the Authentication EM.”

2. Recent Issues You Resolved

What did you fix during your shift?

What was the issue?
What domain was affected?
What was the root cause?
How did you fix it?
Is there follow-up work needed? (ticket, monitoring change, etc.)

Example:

“Resolved: Authorization policy evaluation errors around 3 PM. The policy cache wasn’t being refreshed due to a Redis connection timeout. I manually flushed the cache and fixed the Redis connection pool settings. Created issue #12345 to add monitoring for cache refresh failures.”

3. Alerts That Fired (Noise)

Which alerts went off but weren’t real issues?

What alert fired?
Why wasn’t it a real issue?
What should be done about it?

Example:

“PipelineSecurityScanTimeout fired 6 times but was always a false alarm. Scans completed successfully, just took longer than the alert threshold. We should either increase the threshold or adjust for expected scan duration. Not urgent, but worth noting.”

4. Pending Changes or Deployments

Did anything get deployed or changed during your shift?

What changed?
When?
Why? (hotfix, planned change, etc.)
What should they watch for?

Example:

“Deployed authentication service version 2.1.3 at 8 PM to fix OAuth token refresh issues. Seems stable so far but monitor AuthTokenRefreshRate and SessionCreationErrors metrics closely for the next hour.”

5. Cross-Domain Context

Things the next person should know about interactions between domains:

Are multiple domains affected?
Are there dependencies between services?
Did you coordinate with other domain teams?

Example:

“The authentication issue earlier also caused authorization timeouts. I coordinated with @AuthorizationEngineer. Both services are stable now, but watch for cascading effects if authentication has issues again.”

6. Context You Have

Things they might not know:

Is the customer aware of any ongoing issues?
Is leadership tracking something?
Are there open incidents you escalated?
Was there a meeting or decision they should know about?

Example:

“The big pipeline security scanner update is tentatively going out tomorrow at 10 AM. Leadership is watching for issues, so if anything unusual happens overnight, don’t wait to escalate.”

How to Do a Handoff

Before Your Shift Ends

30 minutes before:

Start writing your handoff notes
Summarize any active issues
Gather details and links

Synchronous Handoff (Preferred)

If possible, talk to the next person:

Reach out: “Hey, I’m ending my shift now. Do you have 10 minutes for a quick handoff?”
Walk through: Go through each active issue
Answer questions: Let them ask clarifying questions
Verify understanding: Make sure they get it
Leave your contact: “You can Slack me if issues come up”

This takes 15-20 minutes but is worth it, especially for cross-timezone handoffs.

Asynchronous Handoff (When Needed)

If you can’t connect directly (common for cross-timezone handoffs):

Write detailed notes in Slack or a shared doc
Post them in the on-call channel so the next person sees them
Send a direct message to the next person: “@NextEngineer, check the on-call channel for my handoff notes”
Be specific so they don’t have to guess or search for context

Example Handoff Message

@Sarah, I'm handing off from AMER to APAC. Here's what you need to know:

**Active Issue:**

- HighAuthFailureRate on SAML login. Restarted service twice; failures are back up.
  Likely session state issue in new SAML handler from deployment 4 hours ago.
  Recommend checking the deploy; consider rolling back if it keeps happening.
  Domain: Authentication

**Resolved During My Shift:**

- Fixed authorization policy cache refresh (Redis connection timeout)
- Manually flushed cache at 3 PM
- Created issue #12345 for monitoring
  Domain: Authorization

**False Alarms:**

- PipelineSecurityScanTimeout fired 6 times. Not real, just longer scan times.
  Domain: Pipeline Security

**Deployments:**

- Authentication service v2.1.3 went out at 8 PM for OAuth fix. Stable so far but monitor metrics closely.

**FYI:**

- Big pipeline security scanner update tentatively going out tomorrow at 10 AM
- Leadership is tracking issues, so escalate early if anything seems wrong

Let me know if you have questions. I'm around for a bit if you need clarification.

What If You’re Still on an Incident?

If an issue isn’t resolved when your shift ends:

Tell the next person immediately
Don’t just disappear and leave it for them
Stay for a quick handoff conversation to explain context
Offer to help if they get stuck: “Let me know if it gets worse, I can jump in”

The next person shouldn’t inherit a mystery.

After the Handoff

The next person acknowledges they have the information
You’re officially off-call (sort of — stay near your laptop for 30 min)
If they have questions, they can reach out
You’re available if they need help (but not required to jump back in)

What Happens If You Don’t Get a Good Handoff?

If you inherit a shift with no context:

Ask Slack — “Can someone give me context on this issue?”
Check recent deployments — What changed?
Look at Incident.io — Is there a ticket or log?
Reach out to whoever was on-call — Even if they’re off, they can help quickly

This is frustrating, so don’t do this to others.

Common Handoff Mistakes to Avoid

❌ Disappearing Immediately

Don’t just go offline at the end of your shift. Give context first.

❌ Being Too Vague

“Everything was fine” is not helpful. Be specific about what you checked and what happened.

❌ Assuming They Know the Context

The next person might be in a different timezone and domain. Explain everything.

❌ Leaving Active Issues Unaddressed

If something is still breaking, don’t ignore it in the handoff. Explain what you’ve tried and what you recommend.

❌ Not Specifying the Domain

Always clarify whether the issue is Authentication, Authorization, or Pipeline Security.

❌ Being Defensive

If someone asks questions, answer them clearly. “I didn’t debug that far” is fine. “It’s probably not the issue” is not helpful.

Communication and Culture — Learn clear communication during handoffs
Your First Shift — Receive good handoffs when you take over
Coverage and Scheduling — Understand when handoffs occur

Last modified November 10, 2025: Fix oncall directory path and update links (5ca7f7ca)

View page source - Edit this page - please contribute.

Handoffs and Continuity

What is a Handoff?

Why Handoffs Matter

What to Include in a Handoff

1. Current Incidents (Active Issues)

2. Recent Issues You Resolved

3. Alerts That Fired (Noise)

4. Pending Changes or Deployments

5. Cross-Domain Context

6. Context You Have

How to Do a Handoff

Before Your Shift Ends

Synchronous Handoff (Preferred)

Asynchronous Handoff (When Needed)

Example Handoff Message

What If You’re Still on an Incident?

After the Handoff

What Happens If You Don’t Get a Good Handoff?

Common Handoff Mistakes to Avoid

❌ Disappearing Immediately

❌ Being Too Vague

❌ Assuming They Know the Context

❌ Leaving Active Issues Unaddressed

❌ Not Specifying the Domain

❌ Being Defensive

Related Pages