Incident Response Lifecycle
The Incident Response Lifecyle working group is intended to document a shared incident response protocol and knowledgebase.
    Attributes
| Property | Value | 
|---|---|
| Date Created | February 01, 2022 | 
| Target End Date | April 30, 2022 | 
| Slack | wg-incident-respose-management-framework | 
| Google Doc | Incident Response Management Working Group (internal) | 
| Issue Label | WG-IRM (gitlab-com/-org) | 
Business Goal
- Increase efficiency through common incident response, analysis, documentation, ongoing management and reporting methods.
 - Increase transparency through improving visibility and communication of incidents to business and e-group
 - Support results by building our clients’ confidence in GitLab’s ability to quickly resolve and communicate incidents when they occur
 - Align Incident Management activities and priorities with those of the business
 - Prepare materials for the creation of training modules for the Engineer Department on the Incident Management Process at GitLab
 - Highlight dogfooding opportunities
 
Exit Criteria
- Single source of truth documenting incident response management that will be applicable to all areas of Engineering and teams who provide Incident Response
- Each functional area of Engineering will develop their own Incident management requirements for identifying and reacting to service outages or security threats.
 
 - Create a comprehensive knowledge base for GitLab team members to help them understand how incident response teams implement the IR process
 
Outcome
- Help teams across GitLab lower MTTR
 
Other Investigations
- Improvements in feedback and learnings from Incident to build resiliance
 - Service Catalog
 
What do other companies do?
How is IR done today?
- SIRT
 - On-call
 - Reliability
 - Support
- How to Perform CMOC Duties
 - Contacting Customers
 - Sending Notices (small number of users)
 
 
Noted issues
Related Issues
Roles and Responsibilities
| Working Group Role | Person | Title | 
|---|---|---|
| Facilitator | Anna Liisa Moter | Manager Reliability | 
| Exec Sponsor | Steve Loyd | VP Infrastructure | 
| Member | Anthony Fappiano | Manager Reliability | 
| Development Functional Lead | Dan Croft | Senior Engineering Manager, Ops | 
| Member | Sam Goldstein | Director of Engineering, Ops | 
| Member (CMOC) | Kenneth Chu | Support team | 
| Member | Kevin Chu | Group Manager of Product, Monitor | 
Requirements and Considerations
Actors
- Reliability Engineers
 - SIRT Engineers
 - Development Team
 - Quality Team
 - Support Team
 
General
- As a GitLab team member who can raise an incident, I know how incident can be initiated
 - As a GitLab team member who can raise an incident, I have a general understanding about incident severity levels
 - As a GitLab team member who can raise an incident, I understand the high level process of Incident Management and its importance to the business
 - As a GitLab team member who can raise an incident, I can contact the right team via dedicated slack channel.
 - As a GitLab team member who can raise an incident, I can easily find a page in the handbook that documents the Incident Response Procedures
 
SIRT Engineer
- As a SIRT Engineer I know how to pull relevant resources from other teams when I need assistance
 - As a SIRT Engineer I can easily categorized the incident
 - As a SIRT Engineer I can identify triggers and indicators
 - As a SIRT Engineer I know where to document the incident details
 - As a SIRT Engineer I know when to transitions from Incident identification, to mitigation, to remediation, and post to incident activities
 - As a SIRT Engineer I can follow a reporting process to handoff incidents, or provide updates to Management
 
Reliability Engineers
- As a Reliability Engineer, I know how to level an incident in a manner that is consistent across the org
 - As a Reliability Engineer, I know how to engage the other roles during an incident
 - As a Reliability Engineer, I know when to transition from Incident identification, to mitigation to resolution and post-incident activities
 
Development Team
- As a leader in Development who is part of the Incident Manager rotation, I am clear on the role’s responsibilities and how the role supports the Incident Management process.
 
Quality Teams
Support Team
- As a Support Engineer, I know how to create a status page
 - As a Support Engineer, I know the differences between the Incident Status states on the status page
 - As a Support Engineer, I know how frequently to update the status page
 - As a Support Engineer, I know how to engage the Incident Manager or EOC when asking about feedback for an update I am about to post on the status page
 - As a Support Engineer, I know how notify the stakeholders
 - As a Support Engineer, I know how to find related tickets in Zendesk and the GitLab issue tracker to help access the impact of an incident
 - As a Support Engineer, I know how to contact users if their usage of GitLab SaaS was restricted due to an incident
 
  Last modified June 3, 2025: Fix broken links (
d7547623)
