Performance Testing at GitLab

Overview

Performance Testing is a broad discipline that includes various approaches to evaluate a system’s performance characteristics. Load Testing, while often considered synonymous with Performance Testing is one of many approaches to Performance Testing. There are other approaches that do not involve load and enable Shifting Left and Right Performance Testing.

Shift Performance Testing Left and Right

Performance testing is not limited to the final stages of development or to load testing scenarios. It can and should be integrated throughout the entire software development lifecycle, from early stages (shift left) to production monitoring (shift right). This comprehensive approach allows teams to gain a holistic understanding of their system’s performance characteristics. It can also be done on all testing levels not waiting for a full component or system to be ready for testing.

Shifting left in performance testing involves:

Early-stage performance considerations:
- Unit Testing: Utilizing performance-focused gems and frameworks during development.
- Profiling: Analyzing code execution, memory usage, and CPU utilization from the outset.
- Database Performance Testing: Assessing query performance and data access patterns early in development.
Continuous performance awareness:
- Instrumenting Existing Tests: Capturing performance metrics from regular test runs.
- Observability Testing: Leveraging monitoring tools to identify performance trends before they become issues.
- Contract Testing: Defining and testing performance expectations at system boundaries.

Shifting right involves:

Production-level performance evaluation:
- Load Testing: Simulating real-world usage scenarios to understand system behavior under various loads.
- Stress Testing: Pushing the system beyond normal capacity to identify breaking points.
- Soak Testing: Evaluating performance over extended periods of continuous load.
Ongoing performance monitoring:
- Real-time Observability: Continuously monitoring production systems for performance anomalies.
- User-centric Performance Metrics: Gathering and analyzing performance data from actual user interactions.

By combining both left-shifted and right-shifted approaches, teams can create feedback loops that:

Identify potential performance issues earlier in the development cycle.
Continuously validate and improve performance throughout the application lifecycle.
Gain insights into real-world performance characteristics and user experiences.
Create a culture of performance awareness across development, operations, and business teams.

It’s important to note that performance results from one testing level may not directly translate to another. For example, a code change that improves a unit test runtime by one second will probably not result in a one-second improvement in production. However, these metrics serve as valuable indicators in a fast feedback loop, helping teams quickly identify potential performance impacts of code changes.

Unit Testing

At the unit test level, we have several gems included in GitLab that can be used to test performance during development that we can use to get feedback before the code is finalized:

We also have rspec-benchmark so we can specifically test for performance results in rspec.

Observability Testing

Observability testing is described in it’s own page

Instrumenting Existing Testing

We run a large number of tests on a regular basis, by capturing performance results from these runs, we can drive improvements. We can do this in a couple ways:

Capturing performance results from the tests (i.e. duration a test took to run) and compare it between runs. The performance results would not be directly mappable to production but can show a performance change.
Adding tests that specifically look for performance impacts, prime examples are in the unit testing section.
Using the Performance Bar to analyze performance as you are manually testing GitLab.

Contract Testing

Contract testing is the concept of adding a test on boundry of each system (or subsystem) that defines how it interacts with other systems. These contracts can include functional (data format, endpoints available,…) and performance (response time, throttling,…) assertions.

Profiling

We already use profiling tools (i.e. rubocop) in our pipelines to ensure that we meet coding guidelines and avoid common problematic patterns. Several performance focused ones that are in our codebase:

ruby-prof: A comprehensive profiling solution that supports both flat and graph profiles. ruby-prof can measure CPU time, memory allocation, and object creation.
stackprof: A sampling call-stack profiler. It’s designed to be a faster and more memory-efficient alternative to ruby-prof for certain use cases.
memory_profiler: A memory profiler that provides detailed information about memory usage, including object allocation and retention. documentation in our performance guidelines.
rbspy: A sampling profiler for Ruby, documentation in our sidekiq troubleshooting docs
derailed_benchmarks: A set of benchmarks that measure various aspects of Rails application performance, including memory usage and load time. documentation in our performance guidelines.
benchmark-ips: benchmarks a blocks iterations/second
rspec_profiling: collects data on spec execution times, documentation from our performance guidelines.

Some approaches to using these tools are detailed on the profiling page

Database Performance Testing

Database Performance Testing usually focuses on analyzing slow queries and the number of queries generated by page views / actions. Some existing work on this topic.

Load Testing

Load testing is a crucial form of performance testing that simulates real-world usage scenarios to understand how a system behaves under different levels of concurrent users or transactions. It generates unique insights that cannot be obtained through other methods. It has a downside that it can only be done late in the development cycle, you need a functioning environment to generate load against.

Load testing itself has several variations, including:

Stress Testing: Pushing the system beyond its normal capacity to identify breaking points.
Soak Testing: Evaluating system performance over an extended period of continuous load.
Steady State Testing: Assessing system behavior under a consistent, moderate load over time.

Load Test Challenges

Load testing in the cloud presents a number of challenges:

Cloud environments are transitory, so the environment you are testing now may look significantly different after the test is done
- Load tests are only directly mappable to the environment that they are run against
Modern cloud-based systems are large enough that replicating them is incredibly expensive
- Raw system cost, there are a large number of subsystems that are scaled out due to load
- The data that is in the system affects performance (an empty database will give different performance than a fully loaded one)
  - The volume of data / data model often only exists in a production sized environment
Systems can auto-scale as the load increases
- This can cause costs to increase beyond reasonable limits
- Performance tests need to be designed to generate load so the scale-ups happen at known points (or to avoid an autoscale)
A poorly designed Stress Test will predominantly determine that autoscaling functions as contracted
- This is better maintained by an SLA with the vendor

System Level Load Testing

Existing performance testing includes:

This testing is predominately run against our Reference Architectures, but can be run against a live environment, but caution should be applied when running against shared environments as this can notably impact any results.

Component Level Load Testing

We can run load tests on specific sub components. This can be a subsystem (like Gitaly) or a specific server. This testing can be focused on validating that we have optimal loading on that subsystem.

References

External References

Page	Description
Slack’s Koi Pond	Slack’s approach to organizing their load testing effort, into pods of “koi” to test specific sections
Using test automation to enhance Observability	A presentation Andy did on using test automation to improve Observibility
Measure app performance in Visual Studio	Microsoft course on profiling in VSCode
Shift Left Performance Testing	Blog about shifting left performance testing
Netflix performance testing	Blog post about performance testing at Netflix
Automation Pyramid Model for Performance Testing Process	Blog post looking into the test pyramid for performance testing
Continuous Performance Testing: A Comprehensive Guide	Blog post on Continuous Performance Testing
3 Challenges to Effective Performance Testing in Continuous Integration	Blog post on challenges on implementing performance testing in CI
When is the Best Time to Start Performance Testing?	Blog post on when to do performance testing
The Performance Driven Development manifesto	An approach to shifting left performance testing
Catch issues before your customers do: Shift left with k6 and Grafana	Demostration of using K6 and Grafana to shift left performance testing

Internal References

Projects

Project	Description
GPT	The GitLab Performance Tool (gpt) is built and maintained by the GitLab Quality Enablement team to provide performance testing of any GitLab instance
GBPT	SiteSpeed CI pipelines for Quality Performance testing
sitespeed-measurement-setup	Setup to measure performance on Gitlab websites (.com, dev.) through sitespeed.io and report to Grafana
gitlab-exporter	a Prometheus Web exporter that exports GitLab metrics

Documentation pages

Page	Description
Profiling page	Documentation on approaches to do profiling on GitLab
Observability for stage groups	Documentation on Observability focused at Stage Groups
GitLab Performance Monitoring	GitLab comes with its own application performance measuring system called GitLab Performance Monitoring
Performance Bar	Performance Bar that can be used in a running GitLab instance to see metrics
Dev Performance Guidelines	Developer focused Performance Guidelines
Performance Guidelines	Our docs page on performance guidelines
Cells Performance Testing	Cells performance test strategy handbook page
Metrics Catalog	home for our SLA/SLO/SLI definitions
Cells Performance Dashboard	First pass at creating an Observability Performance Dashboard in Grafana
Platform Triage Dashboard	the home page dashboard for our grafana, a common starting point for investigating performance in our Observability
Merge Request Performance Guidelines	Merge Request Performance Guidelines

Last modified April 9, 2025: Observability Based Performance Testing definition (87663b81)

View page source - Edit this page - please contribute.