ClickHouse Usage at GitLab
This page contains information related to upcoming products, features, and functionality.
It is important to note that the information presented is for informational purposes only.
Please do not rely on this information for purchasing or planning purposes.
The development, release, and timing of any products, features, or functionality may be
subject to change or delay and remain at the sole discretion of GitLab Inc.
Summary
ClickHouse is an open-source column-oriented database management system. It can efficiently filter, aggregate, and sum across large numbers of rows. In FY23, GitLab selected ClickHouse as its standard data store for features with big data and insert-heavy requirements such as Observability and Analytics. This blueprint is a product of the ClickHouse working group. It serves as a high-level blueprint to ClickHouse adoption at GitLab and references other blueprints addressing specific ClickHouse-related technical challenges.
Motivation
In FY23-Q2, the Monitor:Observability team developed and shipped a ClickHouse data platform to store and query data for Error Tracking and other observability features. Other teams have also begun to incorporate ClickHouse into their current or planned architectures. Given the growing interest in ClickHouse across product development teams, it is important to have a cohesive strategy for developing features using ClickHouse. This will allow teams to more efficiently leverage ClickHouse and ensure that we can maintain and support this functionality effectively for SaaS and self-managed customers.
Use Cases
Many product teams at GitLab are considering ClickHouse when developing new features and to improve performance of existing features.
During the start of the ClickHouse working group, we documented existing and potential use cases and found that there was interest in ClickHouse from teams across all DevSecOps stage groups.
Goals
As ClickHouse has already been selected for use at GitLab, our main goal now is to ensure successful adoption of ClickHouse across GitLab. It is helpful to break down this goal according to the different phases of the product development workflow.
- Plan: Make it easy for development teams to understand if ClickHouse is the right fit for their feature.
- Develop and Test: Give teams the best practices and frameworks to develop ClickHouse-backed features.
- Launch: Support ClickHouse-backed features for SaaS and self-managed.
- Improve: Successfully scale our usage of ClickHouse.
Non-goals
A strategy for integrating ClickHouse into GitLab Dedicated has not begun. Leadership guidance has been to wait until there is clearer demand for ClickHouse backed features before prioritizing this.
Product roadmap
FY24 H2 (past)
In FY24 Q2 we began working to integrate ClickHouse with GitLab.com to support multiple features under development (see issue). We did not move forward attempting to integrate with self managed at this time due to the uncertain costs and management requirements for self-managed instances. This near-term implementation will be used to develop best practices and strategy to direct self-managed users. This will also constantly shape our recommendations for self-managed instances that want to onboard ClickHouse early. As of FY24 Q3 ClickHouse is available for use with GitLab.com.
FY25 H1 (current)
After we have formulated best practices of managing ClickHouse ourselves for GitLab.com, we will begin to offer supported recommendations for self-managed instances that want to run ClickHouse themselves. During this phase we will allow users to “Bring your own ClickHouse” similar to our approach for Elasticsearch. For the features that require ClickHouse for optimal usage (Value Streams Dashboard, Product Analytics), this will be the initial go-to-market action. Notably, the Observability team has made the decision to support self-managed users via GitLab Cloud Connector instead of following this approach.
Long-term
We will work towards a packaged reference version of ClickHouse capable of being easily managed with minimal cost increases for self-managed users. We should be able to reliably instruct users on the management of ClickHouse and provide accurate costs for usage. This will mean any feature could depend on ClickHouse without decreasing end-user exposure.
Best Practices
Best practices and guidelines for developing performant, secure, and scalable features using ClickHouse are located in the ClickHouse developer documentation.
Cost and maintenance analysis
ClickHouse components cost and maintenance analysis is located in the ClickHouse Self-Managed component costs and maintenance requirements.
Summary
ClickHouse requires additional cost and maintenance for self-managed customers:
- Resource allocation cost: ClickHouse requires a considerable amount of resources to run optimally.
- Minimum cost estimation shows that setting up ClickHouse can be applicable only for very large Reference Architectures: 25k and up.
- High availability: ClickHouse SaaS supports HA. No documented HA configuration for self-managed at the moment.
- Geo setups: Sync and replication complexity for GitLab Geo setups.
- Upgrades: An additional database to maintain and upgrade along with existing Postgres database. This also includes compatibility issues of mapping GitLab version to ClickHouse version and keeping them up-to-date.
- Backup and restore: Self-managed customers need to have an engineer who is familiar with backup strategies and disaster recovery process in ClickHouse or switch to ClickHouse SaaS.
- Monitoring: ClickHouse can use Prometheus, additional component to monitor and troubleshoot.
- Limitations: Azure object storage is not supported. GitLab does not have the documentation or support expertise to assist customers with deployment and operation of self-managed ClickHouse.
- ClickHouse SaaS: Customers using a self-managed GitLab instance with regulatory or compliance requirements, or latency concerns likely cannot use ClickHouse SaaS.
Minimum self-managed component costs
Based on ClickHouse spec requirements analysis
and collaborating with ClickHouse team, we identified the following minimal configurations for ClickHouse self-managed: