Communication and Culture

Being on-call isn’t just about technical skills. It’s also about working well with others during stressful situations. This page covers how we communicate and the culture we build.

Blameless Culture

The single most important thing about incident response: No one gets blamed.

What This Means

When an incident happens:

We ask “what happened?” not “who messed up?”
We look for system problems, not person problems
We learn from incidents, not punish people
Everyone contributes to postmortems without fear

Why It Matters

If people are afraid of being blamed:

They won’t escalate when they should
They’ll hide mistakes instead of fixing them
Incidents take longer to resolve
We don’t learn from failures

Your Role

When an incident happens:

Focus on fixing it, not assigning blame
If someone made a mistake, that’s okay—it’s a learning opportunity
In postmortems, discuss what the system could do better
Avoid phrases like “they should have known” or “that was careless”

Example

Bad: “John deployed broken code; he should have tested it”

Better: “The deployment process didn’t catch this issue. How can we improve testing or CI/CD?”

Communicating During Incidents

Being Clear

When you find something during an investigation, communicate it clearly:

✅ “I found the memory leak in the transaction handler, it started after the 4 PM deployment”
❌ “Things look bad”

Frequency

Update your team regularly:

At the start: “I’m investigating this”
Every 5-10 minutes: “Here’s what I’ve found”
When making changes: “Rolling back version 2.1.3”
When resolved: “Issue is fixed, monitoring for stability”

Escalation Communication

When escalating:

✅ “I’ve investigated for 20 minutes, looked at X, Y, Z. This is beyond my expertise. Escalating to the database team.”
❌ “I don’t know what to do”

Give context. Help the next person understand what you’ve tried.

Over-Communication is Okay

It’s better to update too often than not often enough. People would rather see regular updates than sit wondering if you’re still looking at the issue.

Slack During Incidents

Using Slack Effectively

Post in the incident channel, not DMs
Use threads to keep discussions organized
Pin important information
Use @channel or @here only for urgency (not spam)

Incident Channel Norms

Most teams have standards like:

Updates at least every 10 minutes
Clear status (Investigating / Mitigating / Resolved / Monitoring)
Action items with owners
Links to dashboards or logs when helpful

What NOT to Do

❌ Start side conversations in DMs (keep context in the channel)
❌ Go silent for 30+ minutes (always update progress)
❌ Use vague language (“it looks like…” without evidence)
❌ Blame others

Before You Get Paged: Building Relationships

Being a Good Teammate

Answer questions from newer engineers
Share what you’ve learned from incidents
Update runbooks so others can learn
Acknowledge good work from others

Escalation Communication

When to Escalate

You’ve investigated for 15+ minutes and are stuck
It’s beyond your domain
It’s too urgent for your pace
You need help making a decision

How to Escalate

In Slack and/or Incident.io:

Why: “This is a database issue, I need database expertise”
What you’ve tried: “I checked dashboards, logs, and recent deployments. Nothing obvious.”
What’s needed: “Need someone to check DB replication status”

Getting Escalated

When someone escalates to you:

Respond quickly
Thank them for doing the groundwork
Take the investigation forward
Keep them in the loop

Post-Incident Communication

Postmortems

After significant incidents, your team holds a postmortem:

What happened? — The sequence of events
Why did it happen? — Root causes
What did we learn? — Takeaways
What can we improve? — Action items

Participation

Everyone involved participates
Be honest about mistakes (without blame)
Contribute ideas for improvement
Follow up on action items

Blameless Postmortem Language

During the postmortem:

✅ “The deployment process didn’t catch this issue”
✅ “We didn’t have monitoring for this condition”
✅ “The runbook didn’t have a step for this scenario”

Not:

❌ “Person X made a mistake”
❌ “They didn’t follow the process”

Team Norms and Expectations

Response Times

Acknowledge a page within 15 minutes and be at your place of work shortly thereafter to assist with incident resolution

Staying Engaged

Don’t disappear while investigating. Even if you’re stuck:

“Still investigating, haven’t found the root cause yet”
“Escalating because this is beyond my expertise”
“Waiting for the next level to respond”

Silence creates anxiety.

Professional Behavior

During incidents:

Stay calm
Be respectful
Admit when you don’t know something
Ask for help
Don’t let stress turn into rudeness

We’re all on the same team.

When Things Go Wrong

You Make a Mistake During an Incident

Acknowledge it: “I made an error, here’s what I’m doing to fix it”
Fix it: Focus on resolving the impact
Learn from it: How will you prevent this next time?

You won’t be blamed. We all make mistakes.

Someone Else Makes a Mistake

Don’t call them out publicly
Focus on fixing the issue
In postmortem, discuss what the system could do better
Privately, offer to help them learn

Blaming Happens (But Shouldn’t)

If you hear blame-focused language in postmortems or Slack:

Gently redirect: “Let’s focus on system improvements rather than individual mistakes”
Bring it up with your manager: “I think we’re using language that isn’t aligned with blameless culture”

Cultural Observations

Good Signs

Incidents are discussed openly
People escalate without fear
Mistakes are treated as learning opportunities
Postmortems focus on systems, not people
People thank each other during incidents

Warning Signs

People are afraid to escalate
Blame is assigned in postmortems
Mistakes are hidden
People are defensive
Burnout is common

If you see warning signs, mention it to your manager or rotation leader.

Building Psychological Safety

Psychological safety means you feel safe taking risks, admitting mistakes, and asking questions.

Ways we build it:

Blameless culture in incidents and postmortems
Experienced engineers mentoring newer ones
Questions are encouraged
Escalation is valued, not punished
Time for learning and improvement

Handoffs and Continuity — Apply blameless culture to handoffs
Your First Shift — Use these principles when you get paged
Measuring Success — See how escalation communication impacts metrics

Last modified October 19, 2025: drs-add-landing-page-tier2 (6f2cba79)

View page source - Edit this page - please contribute.

Communication and Culture

Blameless Culture

What This Means

Why It Matters

Your Role

Example

Communicating During Incidents

Being Clear

Frequency

Escalation Communication

Over-Communication is Okay

Slack During Incidents

Using Slack Effectively

Incident Channel Norms

What NOT to Do

Before You Get Paged: Building Relationships

Being a Good Teammate

Escalation Communication

When to Escalate

How to Escalate

Getting Escalated

Post-Incident Communication

Postmortems

Participation

Blameless Postmortem Language

Team Norms and Expectations

Response Times

Staying Engaged

Professional Behavior

When Things Go Wrong

You Make a Mistake During an Incident

Someone Else Makes a Mistake

Blaming Happens (But Shouldn’t)

Cultural Observations

Good Signs

Warning Signs

Building Psychological Safety

Related Pages