Service Maturity Model

Introduction

This page shows the output of our service maturity model for each service in our metrics catalog. The model itself is part of the metrics catalog, and uses information from the metrics catalog and the service catalog to score each service.

To achieve a particular level in the maturity model, a service must meet all the criteria for that level and all previous levels. Some criteria do not apply to all services (for instance, services like PgBouncer do not need development documentation).

Maturity score by service

❌ indicates the service does meet even the Level 1 criteria

Service Level
ai-assisted Level 3
ai-gateway Level 2
api Level 2
atlantis Level 1
camoproxy Level 2
ci-runners Level 2
cloud-sql Level 1
cloudflare Level 1
consul Level 1
customersdot Level 3
errortracking Level 2
ext-pvs Level 3
external-dns Level 1
frontend Level 1
git Level 2
gitaly Level 3
gitlab-static Level 1
glgo Level 2
google-cloud-storage Level 2
internal-api Level 3
istio Level 2
jaeger Level 2
kas Level 3
kube Level 2
logging Level 1
mailgun Level 3
mailroom Level 1
memorystore Level 1
mimir Level 2
monitoring Level 2
nat Level 1
nginx Level 1
ops-gitlab-net Level 3
packagecloud Level 1
patroni Level 2
patroni-ci Level 2
patroni-embedding Level 1
patroni-registry Level 1
pgbouncer Level 1
pgbouncer-ci Level 1
pgbouncer-embedding Level 1
pgbouncer-registry Level 1
plantuml Level 1
postgres-archive Level 1
redis Level 3
redis-cluster-cache Level 3
redis-cluster-chat-cache Level 3
redis-cluster-feature-flag Level 3
redis-cluster-queues-meta Level 3
redis-cluster-ratelimiting Level 3
redis-cluster-repo-cache Level 3
redis-cluster-shared-state Level 3
redis-db-load-balancing Level 3
redis-pubsub Level 3
redis-registry-cache Level 2
redis-sessions Level 3
redis-sidekiq Level 3
redis-tracechunks Level 3
registry Level 2
runway Level 1
search Level 1
sentry Level 2
sidekiq Level 2
thanos Level 2
tracing Level 2
vault Level 2
web Level 2
web-pages Level 2
websockets Level 2
woodhouse Level 3

Maturity detail by service

Key:

  • βœ… Service meets the criteria
  • ❌ Service does not meet the criteria
  • βž– The criteria is skipped. Some maturity criteria make less sense for some services. For example, an infrastructure-facing service like Patroni is crucial to ops, but not related to our Development department, hence it does not require development guidelines.
  • βšͺ We don’t measure the criteria yet. See https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/560 for progress

ai-assisted detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

ai-gateway detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph βž–
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation ❌
SRE guides exist in runbooks ❌
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

api detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

atlantis detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
Service exists in the dependency graph βž–
Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Atlantis is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

camoproxy detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βž–
Reason: Camoproxy does not interact directly with any declared services in our system
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

ci-runners detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

cloud-sql detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Cloud SQL is a managed service of GCP. The logs are available in Stackdriver.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate ❌
SLO monitoring: request rate ❌
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Cloud SQL is an infrastructure component, powered by GCP
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

cloudflare detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Logs from CloudFlare are pushed to a GCS bucket by CloudFlare, and not ingested to ElasticSearch due to volume. See https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloudflare/logging.md for alternatives
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: WAF is an infrastructure component, powered by Cloudflare
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

consul detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Consul is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

customersdot detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: All logs are available in Stackdriver
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

errortracking detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks ❌
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

ext-pvs detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph βž–
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

external-dns detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Logs from external-dns are not ingested to ElasticSearch due to volume. Besides, the logs are also available in Stackdriver
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate ❌
SLO monitoring: request rate ❌
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: external-dns is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

frontend detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Logs from HAProxy are available in BigQuery, and not ingested to ElasticSearch due to volume.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

git detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

gitaly detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

gitlab-static detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Logs from CloudFlare workers are available on-demand but they are not being ingested due to volume
Service exists in the dependency graph βž–
Reason: This service is hosted by Cloudflare and does not depend on any other service
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

glgo detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph βž–
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation ❌
SRE guides exist in runbooks ❌
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

google-cloud-storage detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Access logs of GCS and not enabled due to volume.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

internal-api detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

istio detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Istio service is not deployed in production
Service exists in the dependency graph βž–
Reason: This service does not interfact directly with any other services
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Istio is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

jaeger detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Jaeger service is not deployed in production
Service exists in the dependency graph βž–
Reason: Jaeger is an independent internal observability tool
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

kas detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

kube detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βž–
Reason: This service is managed by GKE at the moment. It does not interfact directly with any other services
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Application logic does not interact with kube
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

logging detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Service exists in the dependency graph βž–
Reason: The logging platform consumes logs via fluentd, but does not interact directly with any other services
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

mailgun detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Mailgun is a vendor
Service exists in the dependency graph βž–
Reason: Mailgun is a vendor
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βšͺ Not Implemented
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

mailroom detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

memorystore detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Memorystore is a managed service of GCP. The logs are available in Stackdriver.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate ❌
SLO monitoring: request rate ❌
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Memorystore is an infrastructure component, powered by GCP
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

mimir detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8
Service exists in the dependency graph βž–
Reason: Mimir is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

monitoring detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

nat detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: NAT is managed by GCP, thus the logs are avaiable in Stackdriver.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: NAT is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

nginx detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Logs from nginx are not ingested to ElasticSearch due to volume. Usually, workhorse logs will cover the same ground. Besides, the logs are also available in Stackdriver
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Application logic does not interact with nginx
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

ops-gitlab-net detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
Service exists in the dependency graph βž–
Reason: ops.gitlab.net is a standalone GitLab deployment
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βšͺ Not Implemented
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

packagecloud detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

patroni detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

patroni-ci detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

patroni-embedding detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

patroni-registry detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

pgbouncer detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

pgbouncer-ci detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

pgbouncer-embedding detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

pgbouncer-registry detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

plantuml detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: The logs are available in Stackdriver.
Service exists in the dependency graph βž–
Reason: Platuml is a is a stateless web application that generates UML diagrams on the fly. The rendered markdown points to the platuml server in the frontends. It does not interact with any declared services
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

postgres-archive detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: postgres-archive is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-chat-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-feature-flag detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-queues-meta detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-ratelimiting detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βšͺ Not Implemented
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-repo-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-cluster-shared-state detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-db-load-balancing detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-pubsub detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-registry-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation ❌
SRE guides exist in runbooks ❌
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-sessions detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-sidekiq detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

redis-tracechunks detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βž–
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

registry detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

runway detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Runway is a platform. The logs are available in Stackdriver.
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex ❌
SLO monitoring: error rate ❌
SLO monitoring: request rate ❌
Level 3 Service health dashboards βœ… 1, 2, 3
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

search detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate ❌
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

sentry detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: We are migrating our self-managed Sentry instance to the hosted one. For more information: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/13963. Besides, Sentry logs are also available in Stackdriver.
Service exists in the dependency graph βž–
Reason: Sentry is an independent internal observability tool
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

sidekiq detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

thanos detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Service exists in the dependency graph βž–
Reason: Thanos is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βšͺ Not Implemented
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

tracing detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks ❌
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

vault detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Vault is a pending project at the moment. There is no traffic at the moment. We'll add logs and metrics in https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
Service exists in the dependency graph βž–
Reason: Vault is a pending project at the moment. There is no traffic at the moment. The progress can be tracked at https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βž–
Reason: Vault is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

web detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6, 7, 8
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

web-pages detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics βšͺ Not Implemented
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

websockets detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βœ… 1, 2, 3, 4, 5, 6
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex ❌
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented

woodhouse detail

Level Criterion Passed
Level 1 Exists in the service catalog βœ… 1
Structured logs available in Kibana βž–
Reason: Log volume is very low; tooling links to StackDriver provided which is sufficient for the purposes
Service exists in the dependency graph βœ… 1
Level 2 SLO monitoring: apdex βœ… 1
SLO monitoring: error rate βœ… 1
SLO monitoring: request rate βœ… 1
Level 3 Service health dashboards βœ… 1
SLA calculations driven from SLO metrics βž–
Reason: Service is not user facing
All components include an apdex βœ… 1
Logging includes metadata for measuring scalability βšͺ Not Implemented
Developer guides exist in developer documentation βœ… 1
SRE guides exist in runbooks βœ… 1
Metrics on downstream service usage βšͺ Not Implemented
Level 4 Prepared Kibana dashboards βšͺ Not Implemented
Dashboards linked from metrics catalogs βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented
Level 5 Long-term forecasting utilization and usage βšͺ Not Implemented
70% of requests covered by at least one SLI βšͺ Not Implemented
Automatic alert routing βšͺ Not Implemented