Cells Networking

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
accepted sxuereb tkhandelwal3 tkhandelwal3 devops platforms 2025-02-21

Pre-Reads

Guidelines

  • Cross-VPC communication in GCP: Any interactions between services in different VPCs. For instance, Cell communicating with Topology Service will use Private Service Connect, which helps avoid overlapping CIDRs, VPC peering limitations, inconsistent DNS resolution across VPCs, and going over the public internet.
  • Authentication and Authorization: Each service needs to have an identity in the form of a certificate, or a token that will be used to communicate using HTTP requests. Using that identity will also be used to authorize the request. The certificate identity is better known as mTLS and has a separate blueprint.
  • Cells are not public but remain individually accessible: Users must submit all requests through gitlab.com domain on the public internet to maintain a central entry point and a global counter for rate limits.
    • External: The cells are individually accessible through an external hostname but this is not public - customers cannot directly interact with the cell domains.
    • Public: The cells are only accessible to customers through the gitlab.com domain which is available on the public internet.

With the guidelines above we will end up with the following communication between

Cross-VPC communication in GCP

Each Cell will have its own GCP project resulting in its own VPC. This means when a Cell needs to send requests to downstream services it will either have to go through the public internet or internal network somehow. In ADR 004 we decided we will use Private Service Connect for internal communication. This means any requests between services should be internal for speed and cost reasons. Otherwise, we would have to pay for ingress and egress fees. Private Service Connect traffic goes directly from consumer clients to producer backends without intermediate hops or proxies. NAT (Network Address Translation) is performed directly on the physical host machines that host the consumer and producer VMs, which reduces latency and increases bandwidth capacity.

Private Service Connect is going to be used for communication between GCP hosted services only. We can’t use it between HTTP Router and Topology Service since HTTP Router is not hosted in GCP

With Private Service Connect there are two entities to keep in mind: the Consumer and the Producer. For example, a Cell (Consumer) sending a request to Topology Service (Producer).

There are two ways for a Consumer to access the Producer either through an Endpoint or a Backend, we will use a Backend since Private Service Connect backends use a load balancer configured with Private Service Connect network endpoint group (NEG) backends. Accessing APIs and services through a consumer-managed load balancer provides several benefits:

  • Load balancers can act as a centralized policy enforcement point where security policies (such as Google Cloud Armor policies and SSL policies) or routing policies (such as Google Cloud URL maps) are enforced.
  • They provide centralized metrics and logging that a published service might not provide
  • They allow consumers to control their routing and failover.
  • All load balancers should be multi-regional by default since we have Cells running with Geo for failover, and Topology Service running in two regions.

diagram showing how private service connect will work

source

Authentication and Authorization

We would be leveraging a mix of mTLS along with Cloudflare Zero Trust. Each cell should have an identity in the form of a certificate that will be used to connect to Topology Service as a way to authenticate. It’s up to the Topology service to implement authorization, using the identity provided in the certificate. For example, a Cell connects to the Topology Service using mTLS for authentication, and then the Topology Service will use that identity to authorize if they can run that request.

Authentication Protocols and Connection Matrix

Client Server Protocol Mechanism
Cloudflare / HTTP Router Topology Service HTTP Cloudflare Zero Trust (using Service Token)
Cloudflare / HTTP Router Gitlab Webserver/Cell Zone HTTP Cloudflare Zero Trust (using Service Token)
Gitlab Webserver Topology Service gRPC normal mTLS handled by the webservers; (since the connection would be through Private Connect, we wouldn’t be going through cloudflare)
Operators Cell Zone/GitLab webserver HTTP Zero trust with IdP auth

Cells are not public but remain individually accessible

All public traffic will flow through the gitlab.com domain as it exists today. This gives us a central point of entry making it easier for observability, audit tracking, forward/backward compatibility, and a centralized place to configure and manage Web Application Firewalls and Rate limits.

Each Cell will still have its dedicated domain known as managed_domain inside of the tenant model, which will be registered in Cloudflare, the reason for this is built on top of Authentication and Authorization where each workload has its own identity, and for the Topology Service and HTTP Router to have an identifier for a Cell. The Cell should be directly accessible not just programmatically by the client, but also by human operators that will circumvent the HTTP Router and connect directly to it for operational reasons such as debugging the Rails application without the HTTP Router and Topology service.

This means that a Cell has 2 types of clients, a human client and other services like HTTP Router and Instrumentor, both will be solved using the same core technologies using Cloudflare Zero Trust.

Human Operator

With Cloudflare Zero Trust we can set up an Identity provider such as Okta so that when we access the managed_domain using the browser we will be required to log into Okta to continue to the cell.

We can configure Access Policies on our application in CloudFlare ZeroTrust and restrict the access to Operators part of a particular Okta Group.

This has been validated by the Cloudflare Zero Trust PoC.

Services

We can leverage Cloudflare Zero Trust Service Token Auth for secure communication between our HTTP Router to Cell and Topology Service.

The service auth token can be uploaded as worker secrets and can be added as headers while proxying the request to a cell.

We validated how this would be working as part of CloudFlare Zero Trust PoC.