GitLab Dedicated Observability and Monitoring (Grafana)
Grafana data is primarily for internal use
Monitoring and observability graphs from Grafana should not be shared with customers by default.
If you think sharing graphs would benefit the customer, please read Sharing internal logs, data & graphs.
Accessing Grafana
The credentials for accessing the Grafana instance associated with each GitLab Dedicated tenant are stored in the GitLab Dedicated - Support
1Password Vault. As with the OpenSearch instances, the passwords are referred to by customer number. Use the information in the internal note in the ticket and the subdomain in the website
field in 1Password to find the credentials for the correct GitLab Dedicated tenant.
Preprod deployments
Use the GitLab Dedicated Preprod switchboard to find links to the Grafana dashboard for a specific customer.
Finding Dashboards
When logging in for the first time, dashboards are not immediately visible and you will be greeted by a Grafana welcome screen. To find the dashboards:
- In the left pane, click on
Dashboards
. - The
Dashboards
page has a searchable list of all the dashboards available to you - The
Triage
dashboard is the best starting point, especially if you are new to Grafana.
Grafana tips
The General / Triage
dashboard is most useful for an emergency as it has the pods all laid out in a single view. By default it has 6 hours of data. It is helpful for finding blips & dips. Use this data to correlate to other dashboards.
If there is a point of interest on a given graph in the dashboard, you can click + drag
to zoom in on the graph.
Remember that Grafana is used for visualizing issues and spotting problems. It won’t tell us directly what is wrong. You must correlate to the logs to find the exact problem.
As an example, the Triage dashboard may show that webservice
errors are increasing. Use this to correlate an approximate time and filter out the other logs by the kubernetes.labels.app
to find the error for just webservice
.
The General / Triage
dashboard contains information about Service Apdex (application performance index), Service Error Ratio, RPS (requests per second) and Saturation information for each service included in that dashboard. In general, you should notice that increases in the Service Error Ratio correspond to decreases in the Service Apdex. Read more about GitLab Dedicated SLAs.
e0420803
)