Service Maturity Model
Introduction
This page shows the output of our service maturity model for each service in our metrics catalog. The model itself is part of the metrics catalog, and uses information from the metrics catalog and the service catalog to score each service.
To achieve a particular level in the maturity model, a service must meet all the criteria for that level and all previous levels. Some criteria do not apply to all services (for instance, services like PgBouncer do not need development documentation).
Maturity score by service
❌ indicates the service does meet even the Level 1 criteria
Maturity detail by service
Key:
- ✅ Service meets the criteria
- ❌ Service does not meet the criteria
- ➖ The criteria is skipped. Some maturity criteria make less sense for some services. For example, an infrastructure-facing service like Patroni is crucial to ops, but not related to our Development department, hence it does not require development guidelines.
- ⚪ We don’t measure the criteria yet. See https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/560 for progress
ai-assisted detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
ai-gateway detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Runway structured logs are temporarily available in Stackdriver |
|
Service exists in the dependency graph |
➖ Reason: Runway services are deployed outside of the monolith |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ❌ | |
SRE guides exist in runbooks | ❌ | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
api detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
atlantis detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613 |
|
Service exists in the dependency graph |
➖ Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613 |
|
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Atlantis is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
camoproxy detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph |
➖ Reason: Camoproxy does not interact directly with any declared services in our system |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
ci-runners detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
cloud-sql detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Cloud SQL is a managed service of GCP. The logs are available in Stackdriver. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ❌ | |
SLO monitoring: request rate | ❌ | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Cloud SQL is an infrastructure component, powered by GCP |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
cloudflare detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Logs from CloudFlare are pushed to a GCS bucket by CloudFlare, and not ingested to ElasticSearch due to volume. See https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloudflare/logging.md for alternatives |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: WAF is an infrastructure component, powered by Cloudflare |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
consul detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Consul is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
customersdot detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: All logs are available in Stackdriver |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
errortracking detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ❌ | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
ext-pvs detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Runway structured logs are temporarily available in Stackdriver |
|
Service exists in the dependency graph |
➖ Reason: Runway services are deployed outside of the monolith |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
external-dns detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Logs from external-dns are not ingested to ElasticSearch due to volume. Besides, the logs are also available in Stackdriver |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ❌ | |
SLO monitoring: request rate | ❌ | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: external-dns is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
frontend detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Logs from HAProxy are available in BigQuery, and not ingested to ElasticSearch due to volume. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
git detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
gitaly detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
gitlab-static detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Logs from CloudFlare workers are available on-demand but they are not being ingested due to volume |
|
Service exists in the dependency graph |
➖ Reason: This service is hosted by Cloudflare and does not depend on any other service |
|
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
glgo detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Runway structured logs are temporarily available in Stackdriver |
|
Service exists in the dependency graph |
➖ Reason: Runway services are deployed outside of the monolith |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ❌ | |
SRE guides exist in runbooks | ❌ | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
google-cloud-storage detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Access logs of GCS and not enabled due to volume. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
internal-api detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
istio detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Istio service is not deployed in production |
|
Service exists in the dependency graph |
➖ Reason: This service does not interfact directly with any other services |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Istio is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
jaeger detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Jaeger service is not deployed in production |
|
Service exists in the dependency graph |
➖ Reason: Jaeger is an independent internal observability tool |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
kas detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
kube detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph |
➖ Reason: This service is managed by GKE at the moment. It does not interfact directly with any other services |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Application logic does not interact with kube |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
logging detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 | |
Service exists in the dependency graph |
➖ Reason: The logging platform consumes logs via fluentd, but does not interact directly with any other services |
|
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
mailgun detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Mailgun is a vendor |
|
Service exists in the dependency graph |
➖ Reason: Mailgun is a vendor |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ⚪ Not Implemented | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
mailroom detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
memorystore detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Memorystore is a managed service of GCP. The logs are available in Stackdriver. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ❌ | |
SLO monitoring: request rate | ❌ | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Memorystore is an infrastructure component, powered by GCP |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
mimir detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8 | |
Service exists in the dependency graph |
➖ Reason: Mimir is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
monitoring detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
nat detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: NAT is managed by GCP, thus the logs are avaiable in Stackdriver. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: NAT is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
nginx detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Logs from nginx are not ingested to ElasticSearch due to volume. Usually, workhorse logs will cover the same ground. Besides, the logs are also available in Stackdriver |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Application logic does not interact with nginx |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
ops-gitlab-net detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 | |
Service exists in the dependency graph |
➖ Reason: ops.gitlab.net is a standalone GitLab deployment |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ⚪ Not Implemented | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
packagecloud detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
patroni detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: patroni is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
patroni-ci detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: patroni is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
patroni-embedding detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: patroni is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
patroni-registry detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: patroni is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
pgbouncer detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
pgbouncer-ci detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
pgbouncer-embedding detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
pgbouncer-registry detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: pgbouncer is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
plantuml detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: The logs are available in Stackdriver. |
|
Service exists in the dependency graph |
➖ Reason: Platuml is a is a stateless web application that generates UML diagrams on the fly. The rendered markdown points to the platuml server in the frontends. It does not interact with any declared services |
|
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
postgres-archive detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: postgres-archive is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-cache detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-chat-cache detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-feature-flag detail
- Level 3
- Link to redis-cluster-feature-flag dashboard
- Service definition of redis-cluster-feature-flag
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-queues-meta detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-ratelimiting detail
- Level 3
- Link to redis-cluster-ratelimiting dashboard
- Service definition of redis-cluster-ratelimiting
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ⚪ Not Implemented | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-repo-cache detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-cluster-shared-state detail
- Level 3
- Link to redis-cluster-shared-state dashboard
- Service definition of redis-cluster-shared-state
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-db-load-balancing detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-pubsub detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-registry-cache detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ❌ | |
SRE guides exist in runbooks | ❌ | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-sessions detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-sidekiq detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
redis-tracechunks detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability |
➖ Reason: Metadata can't be injected in redis logs |
|
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
registry detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
runway detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Runway is a platform. The logs are available in Stackdriver. |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ❌ |
SLO monitoring: error rate | ❌ | |
SLO monitoring: request rate | ❌ | |
Level 3 | Service health dashboards | ✅ 1, 2, 3 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
search detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ❌ | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
sentry detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: We are migrating our self-managed Sentry instance to the hosted one. For more information: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/13963. Besides, Sentry logs are also available in Stackdriver. |
|
Service exists in the dependency graph |
➖ Reason: Sentry is an independent internal observability tool |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
sidekiq detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
thanos detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 | |
Service exists in the dependency graph |
➖ Reason: Thanos is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ⚪ Not Implemented | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
tracing detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ❌ | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
vault detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Vault is a pending project at the moment. There is no traffic at the moment. We'll add logs and metrics in https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739 |
|
Service exists in the dependency graph |
➖ Reason: Vault is a pending project at the moment. There is no traffic at the moment. The progress can be tracked at https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739 |
|
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation |
➖ Reason: Vault is an infrastructure component, developers do not interact with it |
|
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
web detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6, 7, 8 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
web-pages detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6, 7 |
SLA calculations driven from SLO metrics | ⚪ Not Implemented | |
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
websockets detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana | ✅ 1, 2, 3, 4, 5, 6 | |
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1, 2, 3, 4, 5, 6 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ❌ | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
woodhouse detail
Level | Criterion | Passed |
---|---|---|
Level 1 | Exists in the service catalog | ✅ 1 |
Structured logs available in Kibana |
➖ Reason: Log volume is very low; tooling links to StackDriver provided which is sufficient for the purposes |
|
Service exists in the dependency graph | ✅ 1 | |
Level 2 | SLO monitoring: apdex | ✅ 1 |
SLO monitoring: error rate | ✅ 1 | |
SLO monitoring: request rate | ✅ 1 | |
Level 3 | Service health dashboards | ✅ 1 |
SLA calculations driven from SLO metrics |
➖ Reason: Service is not user facing |
|
All components include an apdex | ✅ 1 | |
Logging includes metadata for measuring scalability | ⚪ Not Implemented | |
Developer guides exist in developer documentation | ✅ 1 | |
SRE guides exist in runbooks | ✅ 1 | |
Metrics on downstream service usage | ⚪ Not Implemented | |
Level 4 | Prepared Kibana dashboards | ⚪ Not Implemented |
Dashboards linked from metrics catalogs | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented | |
Level 5 | Long-term forecasting utilization and usage | ⚪ Not Implemented |
70% of requests covered by at least one SLI | ⚪ Not Implemented | |
Automatic alert routing | ⚪ Not Implemented |
Last modified May 18, 2024: Remove engineering exclusion and fix related errors (
bed28b7f
)