Performance results datastore
Summary
This blueprint proposes a Performance Results Data store to build off of the GitLab Performance Tool by centralizing performance metrics in order to enable data-informed decisions, dynamic baselines, and integrate performance awareness throughout the development lifecycle.
Goals
- Create a centralized repository for all performance test results
- Enable programmatic access to performance data for analysis and visualization
- Enable trend analysis between different test runs, environments, and GitLab versions
- Provide a foundation for automated performance regression detection
- Integrate with existing CI/CD pipelines for automatic data collection
- Support both high-level aggregated metrics and detailed raw performance data
- Enable dynamic baseline creation based on historical data
Non-Goals
- Replacing existing performance testing tools: We’ll enhance and integrate with tools like GitLab Performance Tool rather than replacing them
- Duplicating our monitoring infrastructure: We’ll leverage our current monitoring solutions rather than creating parallel systems
- Reinventing visualization: We’ll continue to make use of current visualization tools instead of creating custom alternatives
- Competing with real-time monitoring: We’ll complement our existing real-time monitoring capabilities rather than replacing them
- Expanding beyond performance focus: We’ll maintain a dedicated focus on performance metrics to ensure depth and relevance of insights
This initiative is about amplification and evolution of our performance capabilities, not replacement. We’re building upon the solid foundation created by previous work to unlock new possibilities without discarding the valuable systems already in place.
Proposal
We propose building a Performance Results Datastore to enable cross functional analysis and interpretation of the performance metrics. Key capabilities will include:
- Support multiple types of performance tests
- Store results with rich metadata (GitLab version, environment details, test parameters)
- Provide flexible querying capabilities for various analysis needs
- Scale to accommodate growing volumes of performance data
- Integrate seamlessly with existing CI/CD pipelines
Implementation Approach
We’ll leverage existing infrastructure (InfluxDB and Grafana) to create a proof-of-concept that enables:
- Storage of performance data from multiple sources
- Visualization through Grafana dashboards
- Programmatic access from CI/CD pipelines
Infrastructure Details
The following resources have been provisioned for this implementation:
-
InfluxDB Instance
- URL: https://influxdb.quality.gitlab.net/
- Bucket name:
perf-test-metrics
-
Grafana Instance
- URL: https://dashboards.quality.gitlab.net/
- Connected to the InfluxDB instance for visualization of performance metrics
Architecture Overview
flowchart LR
%% Define nodes with meaningful IDs
subgraph SRC["Performance Data Sources"]
direction TB
RA["Reference Architecture Runs"]
MR["MR Pipeline Tests"]
UX["UX Performance (GBPT)"]
end
subgraph STORE["Performance Datastore"]
direction TB
SS["Server-side Metrics"]
FE["Frontend/UX Metrics"]
end
subgraph VISUAL["Analysis & Visualization"]
direction TB
CI["CI/CD Pipeline Feedback"]
GRAF["Grafana Dashboards"]
end
subgraph CONS["Consumers"]
direction TB
DEV["Engineering Teams"]
PERF["Performance Stakeholders"]
end
%% Define the flow connections with better spacing
RA --> SS
MR --> SS
UX --> FE
SS --> CI
SS --> GRAF
FE --> CI
FE --> GRAF
CI --> DEV
GRAF --> DEV
GRAF --> PERF
%% Style the different components
classDef sources fill:#e1f5fe,stroke:#0288d1,stroke-width:2px,rx:10px,ry:10px
classDef store fill:#fff8e1,stroke:#ffa000,stroke-width:2px,rx:10px,ry:10px
classDef visual fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,rx:10px,ry:10px
classDef consumers fill:#ffebee,stroke:#c62828,stroke-width:2px,rx:10px,ry:10px
classDef useCases fill:#e0f7fa,stroke:#00838f,stroke-width:1px
classDef ucItem fill:white,stroke:none
classDef ucLabel fill:none,stroke:none
classDef nodes fill:white,stroke-width:1px
class SRC sources
class STORE store
class VISUAL visual
class CONS consumers
class RA,MR,UX,SS,FE,CI,GRAF,DEV,PERF nodes
Key Use Cases
- MR Pipeline Performance Validation
- Trend Analysis
- Dynamic Baselines
- Regression Detection
Data Retention
Performance testing across numerous MRs will generate substantial data volumes. We need a thoughtful retention strategy to balance analytical value with storage constraints. Potential approaches include:
- Selective Storage - Only persist results from merged MRs, treating pipeline runs on unmerged MRs as transient data
- Data Aggregation - Implement a periodic process to consolidate historical data, preserving trends while reducing granularity of older measurements
- Time-based Retention - Maintain full fidelity for recent data, for example 30-90 days, then progressively reduce resolution for older data
- Significance-based Pruning - Retain all data points that represent significant changes or anomalies, while sampling or aggregating data that follows expected patterns
- Environment-based Policies - Apply different retention rules based on the source environment, for example longer retention for Reference Architecture runs vs. MR pipeline tests.
Our initial implementation will focus on establishing the core infrastructure while we evaluate these approaches based on actual usage patterns.
Sample Workflows
As a developer, I want to know if my change affects performance
sequenceDiagram
participant DS as InfluxDB
participant MR as MR Pipeline
participant DEV as Developer
note over MR: Run performance test
MR->>DS: Query baseline results
DS->>MR: Return historical data
MR->>DS: Write results
note over MR: Determine Pass/Fail status
MR->>DEV: Report status
As a Stakeholder, I want to be able to investigate performance trends
sequenceDiagram
participant UX as UX Performance Run
participant RA as Reference Architecture Run
participant MR as MR Pipeline
participant DS as InfluxDB
participant GF as Grafana
participant S as Stakeholder
MR->>DS: Write results
RA->>DS: Write results
UX->>DS: Write results
DS->>GF: Results
GF->>S: Visualized Results
Alternative Solutions
- Run a baseline run and the performance run every MR
- Pros:
- Provides immediate, direct comparison without relying on historical data
- Guarantees latest reference point for comparison
- Eliminates concerns about environmental or temporal variations
- Cons:
- Dramatically increases CI resource consumption and pipeline duration
- Creates redundant test runs of the same baseline code
- Doubles the testing time for every performance-relevant MR
- Scales poorly as more performance tests are added to the suite
- Use a hard coded baseline
- Pros:
- Use Static, Hard-Coded Baselines
- Pros:
- Simple implementation with minimal infrastructure needs
- Consistent reference points for comparison
- Low maintenance overhead for implementation
- Cons:
- Quickly becomes outdated as the application evolves
- Fails to account for legitimate performance changes over time
- Requires manual updates to adjust expectations
- Pros:
- Use Existing Per-Environment Prometheus Instances
- Pros:
- Leverages existing monitoring infrastructure
- Data already collected and available
- Cons:
- Data is isolated within each environment
- Added complexity on the CI run to determine which data source to use
- Test run will vary, will the needed data be present?
- Pros:
- Build a Custom Performance Analytics Platform
- Pros:
- Fully tailored to our specific performance testing needs
- Maximum flexibility in data model and analysis capabilities
- Cons:
- Significantly higher development and maintenance effort
- Longer time to initial value
- Requires specialized skills to build and maintain
- Will reinvent capabilities already available in existing tools
- Pros:
- Use Object Storage (S3/GCS/Package Registry) for JSON Baselines
- Pros:
- Simple implementation with minimal infrastructure dependencies
- Easy integration with CI/CD pipelines and existing tools
- Straightforward version control of baseline files
- Low operational overhead with highly reliable storage
- Cost-effective for the amount of data involved
- Cons:
- Limited query capabilities for dynamic analysis and investigation
- No built-in visualization or trending capabilities
- Requires custom tooling for comparison and regression detection
- Difficult to perform ad-hoc analysis or identify patterns across multiple tests
- Doesn’t scale well for storing full test result datasets, only suitable for baselines
- Pros:
We opted for the current approach because it leverages our existing infrastructure investments while enabling the dynamic baselines critical for effective MR performance testing. This solution offers the best balance of implementation speed, analytical capabilities, and long-term scalability.
References
- GitLab Performance Tool (GPT)
- GPT Benchmarks Wiki
- Reference Architecture Test Environment Details
- Replace InfluxDB with Prometheus InfluxDB exporter
- Shift Left and Right Performance Testing
- End-to-End Pipeline Monitoring
3635a1f4)
