Performance and Scalability
The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.
Reference Architectures
To ensure that self-managed customers have performant, reliable, and scalable on-premise configurations, the Quality Department has built and verified Reference Architectures. The goal is to provide tested and verified examples to customers which can be used to ensure good performance and give insight into what changes need to be made as organizations scale.
Reference Architectures project is used to track all work related
to GitLab Reference Architectures and #reference-architectures
Slack channel is used for
discussions related to the Reference Architectures.
Users | Status | Link to more info |
---|---|---|
1k | Complete | Documentation |
1k hybrid | Complete | Documentation |
2k | Complete | Documentation |
2k hyrbid | Complete | Documentation |
3k | Complete | Documentation |
3k hyrbid | Complete | Documentation |
5k | Complete | Documentation |
5k hyrbid | Complete | Documentation |
10k | Complete | Documentation |
10k hyrbid | Complete | Documentation |
25k | Complete | Documentation |
25k hyrbid | Complete | Documentation |
50k | Complete | Documentation |
50k hyrbid | Complete | Documentation |
100k | To Do (on demand) | Issue link |
Performance Tool
We have created the GitLab Performance Tool which measures the performance of various endpoints under load. This Tool is in use internally within GitLab, but it is also available for self-managed customers to set up and run in their own environments.
If you have a self-managed instance and you would like to use the Tool to test its performance, please take a look at the documentation in the Tool’s README file.
More detailed information about the current test list that is run by GPT can be viewed at the Test Details wiki page.
Test Process
The GitLab Performance Tool is run against the existing reference architectures using the latest Nightly release of GitLab. This allows us to catch and triage degradations early in the process so that we can try to implement fixes before a new release is created. If problems are found, issues are created for degraded endpoints and are then prioritized during the weekly Bug Refinement meeting.
High-level GPT pipeline overview:
- Update environment job: starts up and updates the target environment from Quality Config with the latest Nightly using GitLab Environment Toolkit
- Test job: runs performance tests against the environment
- Report job: publishes results to GPT Wiki and
#gpt-performance-run
Slack channel - Stop job: stops the target environment instances to save costs
Test Results
Information on the testing results can be found over on the Reference Architecture documentation.
Performance results comparison of different GitLab versions
Every month on the 23rd a comparison pipeline is triggered that provides performance results comparison table of the last 5 GitLab versions. It builds GitLab docker container with the test data using performance-images project, runs GPT against the last 5 GitLab versions simultaneously, then it generates performance results summary.
The latest results are automatically posted to the GitLab versions wiki page
in the GPT project and #gpt-performance-run
Slack channel.
No shared environments usage
To ensure consistent and reliable performance results we need to effectively control each part of the process, including the test environment setup and its test data, for the following reasons:
- Shared environments with unknown environment loads and test data shapes can notably skew performance results.
- Performance test runs can affect other pipelines running against the environment, such as GitLab QA.
- Environment configuration such as rate limits can block the tests from running correctly.
- Performance test runs can take more than 90 minutes to complete. Deployments on some environment can occur within 50-60 minutes, which would impact the results notably.
- Investigating any performance test failures wouldn’t be possible due to various reasons as shown above to find the cause as well as not having full access to the environment to perform investigations.
For the above reasons we test against fully controlled environments and don’t tests others such as Staging or Production.
No performance test runs in merge requests
GitLab Performance Tool tests are not executed in merge requests due to several critical factors:
- Requirement for Consistent Test Conditions:
- Performance tests demand strictly repeatable conditions for accurate results.
- This includes identical server specifications, network conditions, and test data across runs.
- Cost, Time and Resource Constraints:
- A complete performance pipeline, including environment setup, data seeding, test execution, and teardown, can exceed 6 hours.
- This duration is not cost-effective for merge request pipelines and can significantly slow down the development process.
- Full-scale performance tests require a Reference Architecture environment, which is impractical and costly to build for each merge request.
- It may also consume excessive CI/CD resources, impacting other critical pipelines.
- Result Interpretation Complexity:
- Performance test results often have inherent variability or “noise”.
- Accurate interpretation requires human expertise to distinguish between normal fluctuations and actual performance degradations.
- This manual review process is not feasible for every merge request.
- Focus on End-to-End Performance:
- These tests are designed to evaluate the overall system performance, which may not be significantly impacted by individual merge requests.
Given these considerations, we adopt an approach of conducting comprehensive performance tests at the end of the test chain, where we can best control the conditions and allocate necessary resources.
For shifting performance testing left, the recommended approach is to break down performance testing to specific components rather than the entire application. For example, GitLab team maintains performance testing for Database Queries. Similar unit-level performance testing approach can be followed by creating dedicated test frameworks, where the components are configured only the mock data and stressed tested accordingly.
Expanding the Tool
The Quality Department aims to enhance the GPT and performance test coverage. One of the goals is to release the GPT v3, you can track its progress in this epic. We plan to further increase the test coverage, especially in more complex areas like CI/CD and Registry.
Additionally, we would like to define a process for conducting an endpoint coverage review on some regular cadence, whether that is after every release, once a quarter, or some other timing. Because GitLab is constantly expanding and evolving, we need to iterate on our coverage in tandem.
We’ve created an epic to track the initial expansion as well as the work defining our recurring process for analyzing endpoints and verifying our coverage is adequate.
Another area that Quality team would like to explore on is to shift performance testing left.
Browser Performance Tool
We have created the GitLab Browser Performance Tool to specifically test web page frontend performance in browsers. More detailed information about the current test pages list can be viewed at the Test Details wiki page.
Testing process is similar to GPT testing process. After 10k environment is updated to the latest Nightly, GBPT is run against the environment and then it’s being shut down to save costs.
Environment | GCP project | Schedule | Latest results and dashboards |
---|---|---|---|
10k | 10k | Every weekday | 10k wiki |
Performance Playbook
When self-managed customers experience or suspect they are experiencing performance issues, we have developed a playbook for initial steps to investigate the problem.
The first step is requesting logs. We use a tool called fast-stats in conjunction with the following log artifacts. These logs should be either rotated, or logs from a peak day after peak time.
production_json.log
api_json.log
- Gitaly logs:
/var/log/gitlab/gitaly/current
- Sidekiq logs:
var/log/gitlab/sidekiq/current
6f6d0996
)