On-Call Processes and Policies - Tier 1

Tier 1 Rotations refer to on-call rotations that respond to pages from automated systems.

Active Tier 1 Rotations

SRE EOC GitLab.com

Responsibilities

In addition to incident management responsibilities, the EOC also is responsible for time sensitive interrupt work required to support the production environment that is not owned by another team. This includes:

  1. Fulfilling Security Incident Response Team (SIRT) requests
  2. Fulfilling Legal Preservation requests
  3. Reviewing and handling certain change requests (CRs). This includes:
    1. Reviewing CRs to ensure they do not conflict with any ongoing incidents or investigations
    2. Executing the CR directly if the author does not have the required permissions to make the change themselves (such as admin-level changes)
    3. Support during C1 CRs, such as database upgrades, that may occur on weekends
  4. Handling incident related teleport access requests
  5. Approving an exception for running ChatOps commands when they fail their safety checks
  6. Investigating and fixing buggy/flapping alerts
  7. Removing alerts that are no longer relevant
  8. Collecting production information when requested
  9. Responding to @sre-oncall Slack mentions
  10. Assisting Release Managers with deployment problems
  11. Being the DRI for incident reviews

GitLab Dedicated Platform

  • Rotation Leader: Florbela Viegas
  • Coverage: 24x7
  • Schedule: schedule

GitLab Dedicated PubSec

  • Rotation Leader: Florbela Viegas
  • Coverage: 24x7
  • Schedule: schedule

Incident Managers (aka IMOC)

  • Rotation Leader: Devin Sylva
  • Coverage: 24x7
  • Schedule: schedule

Further details