Performance and Scalability
The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.
Reference Architectures
To ensure that self-managed customers have performant, reliable, and scalable on-premise configurations, the Quality Department has built and verified Reference Architectures. The goal is to provide tested and verified examples to customers which can be used to ensure good performance and give insight into what changes need to be made as organizations scale.
Reference Architectures project is used to track all work related
to GitLab Reference Architectures and #reference-architectures
Slack channel is used for
discussions related to the Reference Architectures.
Users | Status | Link to more info |
---|---|---|
1k | Complete | Documentation |
1k hybrid | Complete | Documentation |
2k | Complete | Documentation |
2k hyrbid | Complete | Documentation |
3k | Complete | Documentation |
3k hyrbid | Complete | Documentation |
5k | Complete | Documentation |
5k hyrbid | Complete | Documentation |
10k | Complete | Documentation |
10k hyrbid | Complete | Documentation |
25k | Complete | Documentation |
25k hyrbid | Complete | Documentation |
50k | Complete | Documentation |
50k hyrbid | Complete | Documentation |
100k | To Do (on demand) | Issue link |
Performance Tool
We have created the GitLab Performance Tool which measures the performance of various endpoints under load. This Tool is in use internally within GitLab, but it is also available for self-managed customers to set up and run in their own environments.
If you have a self-managed instance and you would like to use the Tool to test its performance, please take a look at the documentation in the Tool’s README file.
More detailed information about the current test list that is run by GPT can be viewed at the Test Details wiki page.
Test Process
The GitLab Performance Tool is run against the existing reference architectures using the latest Nightly release of GitLab. This allows us to catch and triage degradations early in the process so that we can try to implement fixes before a new release is created. If problems are found, issues are created for degraded endpoints and are then prioritized during the weekly Bug Refinement meeting.
High-level GPT pipeline overview:
- Update environment job: starts up and updates the target environment from Quality Config with the latest Nightly using GitLab Environment Toolkit
- Test job: runs performance tests against the environment
- Report job: publishes results to GPT Wiki and
#qa-performance
Slack channel - Stop job: stops the target environment instances to save costs
Test Results
Information on the testing results can be found over on the Reference Architecture documentation.
Performance results comparison of different GitLab versions
Every month on the 23rd a comparison pipeline is triggered that provides performance results comparison table of the last 5 GitLab versions. It builds GitLab docker container with the test data using performance-images project, runs GPT against the last 5 GitLab versions simultaneously, then it generates performance results summary.
The latest results are automatically posted to the GitLab versions wiki page
in the GPT project and #qa-performance
Slack channel.
No shared environments usage
To ensure consistent and reliable performance results we need to effectively control each part of the process, including the test environment setup and its test data, for the following reasons:
- Shared environments with unknown environment loads and test data shapes can notably skew performance results.
- Performance test runs can affect other pipelines running against the environment, such as GitLab QA.
- Environment configuration such as rate limits can block the tests from running correctly.
- Performance test runs can take more than 90 minutes to complete. Deployments on some environment can occur within 50-60 minutes, which would impact the results notably.
- Investigating any performance test failures wouldn’t be possible due to various reasons as shown above to find the cause as well as not having full access to the environment to perform investigations.
For the above reasons we test against fully controlled environments and don’t tests others such as Staging or Production.
Expanding the Tool
The Quality Department aims to enhance the GPT and performance test coverage. One of the goals is to release the GPT v3, you can track its progress in this epic. We plan to further increase the test coverage, especially in more complex areas like CI/CD and Registry.
Additionally, we would like to define a process for conducting an endpoint coverage review on some regular cadence, whether that is after every release, once a quarter, or some other timing. Because GitLab is constantly expanding and evolving, we need to iterate on our coverage in tandem.
We’ve created an epic to track the initial expansion as well as the work defining our recurring process for analyzing endpoints and verifying our coverage is adequate.
Another area that Quality team would like to explore on is to shift performance testing left.
Browser Performance Tool
We have created the GitLab Browser Performance Tool to specifically test web page frontend performance in browsers. More detailed information about the current test pages list can be viewed at the Test Details wiki page.
Testing process is similar to GPT testing process. After 10k environment is updated to the latest Nightly, GBPT is run against the environment and then it’s being shut down to save costs.
Environment | GCP project | Schedule | Latest results and dashboards |
---|---|---|---|
10k | 10k | Every weekday | 10k wiki |
Performance Playbook
When self-managed customers experience or suspect they are experiencing performance issues, we have developed a playbook for initial steps to investigate the problem.
The first step is requesting logs. We use a tool called fast-stats in conjunction with the following log artifacts. These logs should be either rotated, or logs from a peak day after peak time.
production_json.log
api_json.log
- Gitaly logs:
/var/log/gitlab/gitaly/current
- Sidekiq logs:
var/log/gitlab/sidekiq/current
e6bd5a62
)