Cloud Connector architecture evolution

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
implemented mkaeppler ayufan rogerwoo pjphillips devops data stores 2023-09-28

Summary

This design doc covers architectural decisions and proposed changes to Cloud Connector’s technical foundations. Refer to the official architecture documentation for an accurate description of the current status.

Motivation

Our “big problem to solve” is to bring feature parity to our SaaS and self-managed offerings. Until now, SaaS and self-managed (SM) GitLab instances consume features only from the AI gateway, which also implements an Access Layer to verify that a given request is allowed to access the respective AI feature endpoint.

This approach has served us well because it:

  • Required minimal changes from an architectural standpoint to allow SM users to consume AI features hosted by us.
  • Caused minimal friction with ongoing development on GitLab.com.
  • Reduced time to market.

However, the AI gateway alone does not sufficiently abstract over a wider variety of features, as by definition it is designed to serve AI features only.

Goals

We will use this blueprint to make incremental changes to Cloud Connector’s technical framework to enable other backend services to service self-managed/GitLab Dedicated customers in the same way the AI gateway does today. This will directly support our mission of bringing feature parity to all GitLab customers.

The major areas we are focused on are:

  • Provide single access point for customers. We found that customers are not keen on configuring their web proxies and firewalls to allow outbound traffic to an ever growing list of GitLab-hosted services. We therefore decided to install a global, load-balanced entry point at cloud.gitlab.com. This entry point can make simple routing decisions based on the requested path, which allows us to target different backend services as we broaden the feature scope covered by Cloud Connector.
    • Status: done. The decision was documented as ADR-001.
  • Remove OIDC key discovery. The original architecture for Cloud Connector relied heavily on OIDC discovery to fetch JWT validation keys. OIDC discovery is prone to networking and caching problems and adds complexity to solve a problem we don’t have. Our proposed alternative to OIDC discovery is to package the public keys used for token validation from our well-known token issuers with Cloud Connector backends directly instead of fetching them over the network.
    • Status: planned. The decision was documented as ADR-002
  • Rate-limiting features. During periods of elevated traffic, backends integrated with Cloud Connector such as AI gateway or TanuKey may experience resource constraints. GitLab should apply a consistent strategy when deciding which instance should be prioritized over others. This strategy should be uniform across all Cloud Connector services.
    • Status: planned.

Decisions


Cloud Connector ADR 001: Load balancer as single entry point

Context

The original iteration of the blueprint suggested to stand up a dedicated Cloud Connector edge service, through which all traffic that uses features under the Cloud Connector umbrella would pass.

The primary reasons for why we wanted this to be a dedicated service were to:

  1. Provide a single entry point for customers. We identified the ability for any GitLab instance around the world to consume Cloud Connector features through a single endpoint such as cloud.gitlab.com as a must-have property.
  2. Have the ability to execute custom logic. There was a desire from product to create a space where we can run cross-cutting business logic such as application-level rate limiting, which is hard or impossible to do using a traditional load balancer such as HAProxy.

Decision

We decided to take a smaller incremental step toward having a “smart router” by focusing on the ability to provide a single endpoint through which Cloud Connector traffic enters our infrastructure. This can be accomplished using simpler means than deploying dedicated services, specifically by pulling in a load balancing layer listening at cloud.gitlab.com that can also perform simple routing tasks to forward traffic into feature backends.

Cloud Connector ADR 002: Remove OIDC key discovery

Context

We are exploring approaches to move away from OIDC discovery for Cloud Connector token validation for the following reasons:

  • OIDC discovery is prone to networking and caching problems. If the endpoint or system is degraded or caches are stale, Cloud Connector backends cannot validate any AI requests anymore. With an increasing number of Cloud Connector backends, this issue is multiplied. An example is issue 480018 (internal for security reasons).
  • OIDC adds complexity to solve a problem we don’t have. It primarily solves the problem of 3rd parties that don’t know or control each other to exchange identity and key information through a standardized web interface. This is not necessary for Cloud Connector, because all systems involved that dispense or authenticate tokens are built, operated and controlled by GitLab.
  • OIDC discovery requires a callback to gitlab.com from all Cloud Connector backends. This means to support Cells, where customers can reside in different Cells, the application secret holding these keys must be managed either so as to be shared across all Cells, or sharded and the request routed accordingly for the backend to obtain the right key set. We can eliminate this problem entirely by simply not having gitlab.com publish these keys through OIDC endpoints and removing this callback. See issue 451149 for more information.

Our proposed alternative to OIDC discovery is to package the public keys used for token validation from our well-known token issuers with Cloud Connector backends directly instead of fetching them over the network, which currently requires 4 network calls to succeed whenever a server process expires its key cache.

Last modified September 19, 2024: Add Cloud Connector ADR 002 (4c413fa5)