ClickHouse Self-Managed component costs and maintenance requirements
Summary
ClickHouse requires additional cost and maintenance for self-managed customers:
- Resource allocation cost: ClickHouse requires a considerable amount of resources to run optimally.
- Minimum cost estimation shows that setting up ClickHouse can be applicable only for very large Reference Architectures: 25k and up.
- High availability: ClickHouse SaaS supports HA. No documented HA configuration for self-managed at the moment.
- Geo setups: Sync and replication complexity for GitLab Geo setups.
- Upgrades: An additional database to maintain and upgrade along with existing Postgres database. This also includes compatibility issues of mapping GitLab version to ClickHouse version and keeping them up-to-date.
- Backup and restore: Self-managed customers need to have an engineer who is familiar with backup strategies and disaster recovery process in ClickHouse or switch to ClickHouse SaaS.
- Monitoring: ClickHouse can use Prometheus, additional component to monitor and troubleshoot.
- Limitations: Azure object storage is not supported. GitLab does not have the documentation or support expertise to assist customers with deployment and operation of self-managed ClickHouse.
- ClickHouse SaaS: Customers using a self-managed GitLab instance with regulatory or compliance requirements, or latency concerns likely cannot use ClickHouse SaaS.
Minimum self-managed component costs
Based on ClickHouse spec requirements analysis and collaborating with ClickHouse team, we identified the following minimal configurations for ClickHouse self-managed:
- ClickHouse High Availability (HA)
- ClickHouse - 2 machines with >=16-cores, >=64 GB RAM, SSD, 10 GB Internet. Each machine also runs Keeper.
- Keeper - 1 machine with 2 CPU, 4 GB of RAM, SSD with high IOPS
- ClickHouse non-HA
- ClickHouse - 1 machine with >=16-cores, >=64 GB RAM, SSD, 10 GB Internet.
The following cost table was compiled using the machine CPU and memory requirements for ClickHouse, and comparing them to the GitLab Reference Architecture sizes and costs from the GCP calculator.
Reference Architecture | ClickHouse type | ClickHouse cost / (GitLab cost + ClickHouse cost) |
---|---|---|
20 RPS / 1k users - non HA | non-HA | 78.01% |
40 RPS / 2k users- non HA | non-HA | 44.50% |
60 RPS / 3k users - HA | HA | 37.87% |
100 RPS / 5k users - HA | HA | 30.92% |
200 RPS / 10k users - HA | HA | 20.47% |
500 RPS / 25k users - HA | HA | 14.30% |
1000 RPS / 50k users - HA | HA | 8.16% |
NOTE: The ClickHouse Self-Managed component evaluation is the minimum estimation for the costs with a simplified architecture.
The following components increase the cost, and were not considered in the minimum calculation:
- Disk size - depends on data size, hard to estimate.
- Disk types - ClickHouse recommends fast SSDs.
- Network usage - ClickHouse recommends using 10 GB network, if possible.
- For HA we sum minimum cost across all reference architectures from 60 RPS / 3k users to 1000 RPS / 50k users, but HA specs tend to increase with user count.
Resources
- Research and understand component costs and maintenance requirements of running a ClickHouse instance with GitLab
- ClickHouse for Error Tracking on GitLab.com
Last modified August 23, 2024: Ensure frontmatter is consistent (
e47101dc
)