Service Maturity Model

Introduction

This page shows the output of our service maturity model for each service in our metrics catalog. The model itself is part of the metrics catalog, and uses information from the metrics catalog and the service catalog to score each service.

To achieve a particular level in the maturity model, a service must meet all the criteria for that level and all previous levels. Some criteria do not apply to all services (for instance, services like PgBouncer do not need development documentation).

Maturity score by service

❌ indicates the service does meet even the Level 1 criteria

Service	Level
ai-assisted	Level 3
ai-gateway	Level 2
api	Level 2
atlantis	Level 1
camoproxy	Level 2
ci-runners	Level 2
cloud-sql	Level 1
cloudflare	Level 1
consul	Level 1
customersdot	Level 3
errortracking	Level 2
ext-pvs	Level 3
external-dns	Level 1
frontend	Level 1
git	Level 2
gitaly	Level 3
gitlab-static	Level 1
glgo	Level 2
google-cloud-storage	Level 2
internal-api	Level 3
istio	Level 2
jaeger	Level 2
kas	Level 3
kube	Level 2
logging	Level 1
mailgun	Level 3
mailroom	Level 1
memorystore	Level 1
mimir	Level 2
monitoring	Level 2
nat	Level 1
nginx	Level 1
ops-gitlab-net	Level 3
packagecloud	Level 1
patroni	Level 2
patroni-ci	Level 2
patroni-embedding	Level 1
patroni-registry	Level 1
pgbouncer	Level 1
pgbouncer-ci	Level 1
pgbouncer-embedding	Level 1
pgbouncer-registry	Level 1
plantuml	Level 1
postgres-archive	Level 1
redis	Level 3
redis-cluster-cache	Level 3
redis-cluster-chat-cache	Level 3
redis-cluster-feature-flag	Level 3
redis-cluster-queues-meta	Level 3
redis-cluster-ratelimiting	Level 3
redis-cluster-repo-cache	Level 3
redis-cluster-shared-state	Level 3
redis-db-load-balancing	Level 3
redis-pubsub	Level 3
redis-registry-cache	Level 2
redis-sessions	Level 3
redis-sidekiq	Level 3
redis-tracechunks	Level 3
registry	Level 2
runway	Level 1
search	Level 1
sentry	Level 2
sidekiq	Level 2
thanos	Level 2
tracing	Level 2
vault	Level 2
web	Level 2
web-pages	Level 2
websockets	Level 2
woodhouse	Level 3

Maturity detail by service

Key:

✅ Service meets the criteria
❌ Service does not meet the criteria
➖ The criteria is skipped. Some maturity criteria make less sense for some services. For example, an infrastructure-facing service like Patroni is crucial to ops, but not related to our Development department, hence it does not require development guidelines.
⚪ We don’t measure the criteria yet. See https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/560 for progress

ai-assisted detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

ai-gateway detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Runway structured logs are temporarily available in Stackdriver
	Service exists in the dependency graph	➖ Reason: Runway services are deployed outside of the monolith
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	❌
	SRE guides exist in runbooks	❌
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

api detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

atlantis detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
	Service exists in the dependency graph	➖ Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Atlantis is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

camoproxy detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	➖ Reason: Camoproxy does not interact directly with any declared services in our system
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

ci-runners detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

cloud-sql detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Cloud SQL is a managed service of GCP. The logs are available in Stackdriver.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	❌
	SLO monitoring: request rate	❌
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Cloud SQL is an infrastructure component, powered by GCP
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

cloudflare detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Logs from CloudFlare are pushed to a GCS bucket by CloudFlare, and not ingested to ElasticSearch due to volume. See https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloudflare/logging.md for alternatives
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: WAF is an infrastructure component, powered by Cloudflare
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

consul detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Consul is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

customersdot detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: All logs are available in Stackdriver
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

errortracking detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	❌
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

ext-pvs detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Runway structured logs are temporarily available in Stackdriver
	Service exists in the dependency graph	➖ Reason: Runway services are deployed outside of the monolith
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

external-dns detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Logs from external-dns are not ingested to ElasticSearch due to volume. Besides, the logs are also available in Stackdriver
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	❌
	SLO monitoring: request rate	❌
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: external-dns is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

frontend detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Logs from HAProxy are available in BigQuery, and not ingested to ElasticSearch due to volume.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

git detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

gitaly detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

gitlab-static detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Logs from CloudFlare workers are available on-demand but they are not being ingested due to volume
	Service exists in the dependency graph	➖ Reason: This service is hosted by Cloudflare and does not depend on any other service
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

glgo detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Runway structured logs are temporarily available in Stackdriver
	Service exists in the dependency graph	➖ Reason: Runway services are deployed outside of the monolith
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	❌
	SRE guides exist in runbooks	❌
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

google-cloud-storage detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Access logs of GCS and not enabled due to volume.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

internal-api detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

istio detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Istio service is not deployed in production
	Service exists in the dependency graph	➖ Reason: This service does not interfact directly with any other services
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Istio is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

jaeger detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Jaeger service is not deployed in production
	Service exists in the dependency graph	➖ Reason: Jaeger is an independent internal observability tool
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

kas detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

kube detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	➖ Reason: This service is managed by GKE at the moment. It does not interfact directly with any other services
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Application logic does not interact with kube
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

logging detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
	Service exists in the dependency graph	➖ Reason: The logging platform consumes logs via fluentd, but does not interact directly with any other services
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

mailgun detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Mailgun is a vendor
	Service exists in the dependency graph	➖ Reason: Mailgun is a vendor
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	⚪ Not Implemented
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

mailroom detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

memorystore detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Memorystore is a managed service of GCP. The logs are available in Stackdriver.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	❌
	SLO monitoring: request rate	❌
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Memorystore is an infrastructure component, powered by GCP
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

mimir detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8
	Service exists in the dependency graph	➖ Reason: Mimir is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

monitoring detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

nat detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: NAT is managed by GCP, thus the logs are avaiable in Stackdriver.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: NAT is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

nginx detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Logs from nginx are not ingested to ElasticSearch due to volume. Usually, workhorse logs will cover the same ground. Besides, the logs are also available in Stackdriver
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Application logic does not interact with nginx
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

ops-gitlab-net detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
	Service exists in the dependency graph	➖ Reason: ops.gitlab.net is a standalone GitLab deployment
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	⚪ Not Implemented
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

packagecloud detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

patroni detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: patroni is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

patroni-ci detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: patroni is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

patroni-embedding detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: patroni is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

patroni-registry detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: patroni is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

pgbouncer detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

pgbouncer-ci detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

pgbouncer-embedding detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

pgbouncer-registry detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

plantuml detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: The logs are available in Stackdriver.
	Service exists in the dependency graph	➖ Reason: Platuml is a is a stateless web application that generates UML diagrams on the fly. The rendered markdown points to the platuml server in the frontends. It does not interact with any declared services
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

postgres-archive detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: postgres-archive is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-cache detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-chat-cache detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-feature-flag detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-queues-meta detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-ratelimiting detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	⚪ Not Implemented
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-repo-cache detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-cluster-shared-state detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-db-load-balancing detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-pubsub detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-registry-cache detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	❌
	SRE guides exist in runbooks	❌
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-sessions detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-sidekiq detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

redis-tracechunks detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	➖ Reason: Metadata can't be injected in redis logs
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

registry detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

runway detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Runway is a platform. The logs are available in Stackdriver.
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	❌
	SLO monitoring: error rate	❌
	SLO monitoring: request rate	❌
Level 3	Service health dashboards	✅ 1, 2, 3
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

search detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	❌
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

sentry detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: We are migrating our self-managed Sentry instance to the hosted one. For more information: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/13963. Besides, Sentry logs are also available in Stackdriver.
	Service exists in the dependency graph	➖ Reason: Sentry is an independent internal observability tool
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

sidekiq detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

thanos detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
	Service exists in the dependency graph	➖ Reason: Thanos is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	⚪ Not Implemented
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

tracing detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	❌
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

vault detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Vault is a pending project at the moment. There is no traffic at the moment. We'll add logs and metrics in https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
	Service exists in the dependency graph	➖ Reason: Vault is a pending project at the moment. There is no traffic at the moment. The progress can be tracked at https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	➖ Reason: Vault is an infrastructure component, developers do not interact with it
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

web detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6, 7, 8
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

web-pages detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6, 7
	SLA calculations driven from SLO metrics	⚪ Not Implemented
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

websockets detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	✅ 1, 2, 3, 4, 5, 6
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1, 2, 3, 4, 5, 6
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	❌
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

woodhouse detail

Level	Criterion	Passed
Level 1	Exists in the service catalog	✅ 1
	Structured logs available in Kibana	➖ Reason: Log volume is very low; tooling links to StackDriver provided which is sufficient for the purposes
	Service exists in the dependency graph	✅ 1
Level 2	SLO monitoring: apdex	✅ 1
	SLO monitoring: error rate	✅ 1
	SLO monitoring: request rate	✅ 1
Level 3	Service health dashboards	✅ 1
	SLA calculations driven from SLO metrics	➖ Reason: Service is not user facing
	All components include an apdex	✅ 1
	Logging includes metadata for measuring scalability	⚪ Not Implemented
	Developer guides exist in developer documentation	✅ 1
	SRE guides exist in runbooks	✅ 1
	Metrics on downstream service usage	⚪ Not Implemented
Level 4	Prepared Kibana dashboards	⚪ Not Implemented
	Dashboards linked from metrics catalogs	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented
Level 5	Long-term forecasting utilization and usage	⚪ Not Implemented
	70% of requests covered by at least one SLI	⚪ Not Implemented
	Automatic alert routing	⚪ Not Implemented

Last modified June 19, 2025: Move more pages from infrastructure to infrastructure-plaforms (9026ac0b)

View page source - Edit this page - please contribute.