Working with the GitLab.com database for developers

A developer’s guide to working with the GitLab.com database

GitLab.com is powered by a large PostgreSQL database (“the database” in this doc) which is often used as a point of reference in terms of scale - after all, this is the largest installation of GitLab we have access to.

From a development perspective, it is often necessary to gather statistics and other insights from this database - for example to provide insights for query optimization during database review or when we need more insight into data distribution to inform a product or development decision.

This overview is aimed at developers wanting to access the database and also explains how to get this type of access in the first place.

Types of access

There are a couple ways to directly access the database for developers:

Direct access to production

Direct database access to a production replica

This is the most straight forward type of access: You’ll have access to a psql console to an actual replica of the database (which typically serves live traffic). The console will be read-only and a 15s statement timeout must be adhered to.

How to get access: File an access-request and ask for production database console access (role: db-console) to receive permanent SSH access, or use Teleport to request temporary access (Recommended). See below for setting up SSH access.

Access to the “archive” replica

In this case, you’ll have access to a psql console to the “archive” replica. This is a replica of the database which does not serve live traffic and is prepared to be used in a rather “exploratory” fashion (suitable for rather analytical queries, too). The console access is read-only and a 15 minute statement timeout applies. You can verify this by going db console and issuing show statement_timeout command.

How to get access: File an access-request and ask for console access to the archive database in production (role: db-console-archive). See below for setting up SSH access.

Rails console access

A Rails console can be used to access the database, too. Use with extreme care as this is typically a read-write console.

How to get access: File an access-request and ask for Rails console access in production (role: rails-console) to receive permanent SSH access, or use Teleport to request temporary access (Recommended). See below for setting up SSH access.

Setup SSH configuration for direct access

In order to setup direct access, you will have to configure SSH to use the bastion instances as a proxy:

This is an example configuration for a user named joe (replace this with your gitlab email user) to access psql and Rails consoles in gprd:

# GCP production bastion host
Host lb-bastion.gprd.gitlab.com
        User                    joe

Host gprd-rails
        User                    joe-rails
        StrictHostKeyChecking   no
        HostName                console-01-sv-gprd.c.gitlab-production.internal
        ProxyCommand            ssh lb-bastion.gprd.gitlab.com -W %h:%p

Host gprd-psql
        User                    joe-db
        StrictHostKeyChecking   no
        HostName                console-01-sv-gprd.c.gitlab-production.internal
        ProxyCommand            ssh lb-bastion.gprd.gitlab.com -W %h:%p

Host gprd-psql-archive
        User                    joe-db-archive
        StrictHostKeyChecking   no
        HostName                console-01-sv-gprd.c.gitlab-production.internal
        ProxyCommand            ssh lb-bastion.gprd.gitlab.com -W %h:%p

With this setup, the user joe could use the following commands:

ssh gprd-rails        # opens a rails console
ssh gprd-psql         # opens a psql console
ssh gprd-psql-archive # opens a psql console on a replica

DatabaseLabs

Use postgres.ai to work with a thin clone of the database (includes direct psql access to the thin clone)

We have access to postgres.ai which gives us the ability to work with a thin clone of the database. This has the benefit of providing a fully isolated read-write database for a single user. Thin clones are inexpensive to create and can be used and destroyed as needed.

With the extended setup, you’ll be able to use a psql console on a thin clone, too. This is the most flexible way of working with the database, as this even allows you to e.g. create new tables (e.g. for intermediate results) as needed.

Query plans and optimization

In order to evaluate query performance, we have the following tools available:

Retrieve query plans through ChatOps on Slack
Use #database-lab Slack channel to work with a thin clone of the database
Use postgres.ai (currently under evaluation) to fully immerse into working with a thin clone to optimize queries

Where do I get help with setting up access or any other questions?

Please reach out to #database on Slack and consider adding to this page afterwards.

Resources and Tips

Process

Various Product and Engineering processes are available to advocate database improvements that may not fit anywhere in the feature development roadmaps.

Leverage the Planning as a team According to this finding, more than 50% of known query performance issues were identified long time ago. Leverage the planning as a team to prioritize these issues with Product. The process is being implemented by Introducing Error Budgets to Stage Groups.
Follow the intructions in Prioritize technical decisions when collaborating with Product in release planning, which also mirrored in the Product handbook page.
Consider Engineering Allocation for tech-debt issues, which is also mirrored in the Product handbook.
Keep the Architecture process in mind when running into large scale fundamental design challenges.

Best practices

Common issues seen in the application codebase are N+1 queries, inappropriate use of CTEs, and read-only queries not leveraging read replicas.

Video recording - Techniques for researching inefficient DB queries. Highly recommended!
Using Elasticsearch with PostgreSQL slow logs for troubleshooting.
Slow queries chart to identify potential sub-optimal queries.

Tools

Postgres.ai for testing with production like data.
Query plans and optimization.

Last modified June 27, 2024: Fix various vale errors (46417d02)

View page source - Edit this page - please contribute.