Google Artifact Registry Integration
This page contains information related to upcoming products, features, and functionality.
It is important to note that the information presented is for informational purposes only.
Please do not rely on this information for purchasing or planning purposes.
The development, release, and timing of any products, features, or functionality may be
subject to change or delay and remain at the sole discretion of GitLab Inc.
Summary
GitLab and Google Cloud have recently announced a partnership to combine the unique capabilities of their platforms.
As highlighted in the announcement, one key goal is the ability to
“use Google’s Artifact Registry with GitLab pipelines and packaging to create a security data plane”.
The initial step toward this goal is to allow users to configure a new
Google Artifact Registry (abbreviated as GAR from now on)
project integration and display
container image artifacts
in the GitLab UI.
Motivation
Refer to the announcement
blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.
Regarding the scope of this design document, our primary focus is to fulfill the
Product requirement of providing users with visibility over their container
images in GAR. The motivation for this specific goal is rooted in foundational
research on the use of external registries as a complement to the GitLab
container registry (internal).
Since this marks the first step in the GAR integration, our aim is to achieve
this goal in a way that establishes a foundation to facilitate reusability in
the future. This groundwork could benefit potential future expansions, such as
support for additional artifact formats (npm, Maven, etc.), and features beyond
the Package stage (e.g., vulnerability scanning, deployments, etc.).
Goals
- Allow GitLab users to configure a new project integration for connecting to GAR.
- Limited to a single top-level GAR repository per GitLab project.
- Limited to GAR repositories in Standard mode. Support for Remote and Virtual repository modes (both in Preview) is a strech goal.
- Limited to GAR repositories of format Container images.
- Use a Google Cloud service account provided by the GitLab project owner/maintainer to interact with GAR.
- Allow GitLab users to list container images under the connected GAR repository, including sub-repositories. The list should be paginable and sortable.
- For each listed image, display its URI, list of tags, size, digest, upload time, media type, build time, and update time, as documented here.
- Listing container images under the connected GAR repository is restricted to users with Reporter+ roles.
Non-Goals
While some of these may become goals for future iterations, they are currently out of scope:
Proposal
Design and Implementation Details
Project Integration
A new project integration for GAR will be created. Once enabled, this will display a new “Google Artifact Registry” item in the “Operate” section of the sidebar. This is also where the Harbor integration is displayed if enabled.
The GAR integration can be enabled by project owner/maintainer(s), who must provide four configuration parameters during setup:
- GCP project ID: The globally unique identifier for the GCP project where the target GAR repository lives.
- Repository location: The GCP location where the target GAR repository lives.
- Repository name: The name of the target GAR repository.
- GCP service account key: The content (not the file) of the service account key in JSON format (sample).
Authentication
The integration is simplified by using a single GCP service account for the integration. Users retain the ability to audit usage of this service account on the GCP side and revoke permissions if/when necessary.
The service account key provided during the integration setup must be granted at least with the Artifact Registry Reader
role in the target GCP project.
Saving the (encrypted) service account key JSON content in the backend allows us to easily grab and use it to initialize the GAR client (more about that later). Providing the content of the key file instead of uploading it is similar to what we do with users’ public SSH keys.
As previously highlighted, access to the GAR integration features is restricted to users with Reporter+ roles.
Resource Mapping
For the GitLab container registry,
repositories within a specific project must have a path that matches the project full path.
This is essentially how we establish a resource mapping between GitLab Rails and
the registry, which serves multiple purposes, including granular authorization,
scoping storage usage to a given project/group/namespace, and more.
Regarding the GAR integration, since there is no equivalent entities for GitLab
project/group/namespace resources on the GAR side, we aim to simplify matters by
allowing users to attach any GAR repository to any
GitLab project, regardless of their respective paths. Similarly, we do not plan
to restrict the attachment of a particular GAR repository to a single GitLab
project. Ultimately, it is up to users to determine how to organize both
datasets in the way that best suits their needs.
GAR API
GAR provides three APIs: Docker API, REST API, and RPC API.
The Docker API
is based on the Docker Registry HTTP API V2,
now superseded by the OCI Distribution Specification API
(from now on referred to as OCI API). This API is used for pushing/pulling images to/from
GAR and also provides some discoverability operations. Refer to Alternative Solutions
for the reasons why we don’t intend to use it.
Among the proprietary GAR APIs, the REST API
provides basic functionality for managing repositories. This includes
list
and get
operations for container image repositories, which could be used for this integration.
Both operations return the same data structure, represented by the
DockerImage
object, so both provide the same level of detail.
Last but not least, there is also an RPC API,
backed by gRPC and Protocol Buffers. This API provides the most functionality,
covering all GAR features. From the available operations, we can make use of the
ListDockerImagesRequest
and GetDockerImageRequest
operations. As with the REST API, both responses are composed of
DockerImage
objects.
Between the two proprietary API options, we chose the RPC one because it
provides support not only for the operations we need today but also offers
better coverage of all GAR features, which will be beneficial in future
iterations. Finally, we do not intend to make direct use of this API but rather
use it through the official Ruby client SDK. See Client SDK below for more details.
Backend Integration
This integration will need several changes on the backend side of the rails project. See the backend page for additional details.
UI/UX
This integration will include a dedicated page named “Google Artifact Registry,”
listed under the “Operate” section of the sidebar. This page will enable users to
view the list of all container images in the configured GAR repository. See the
UI/UX page for additional details.
GraphQL APIs
TODO: Describe any GraphQL APIs or changes to existing APIs that will be needed for this integration.
Alternative Solutions
Use Docker/OCI API
One alternative solution considered was to use the Docker/OCI API provided by
GAR, as it is a common standard for container registries. This approach would
have allowed GitLab to reuse existing logic
for connecting to container registries, which could potentially speed up
development. However, there were several drawbacks to this approach:
-
Authentication Complexity: The API requires authentication tokens, which need to be requested at the login endpoint. These tokens have limited validity, adding complexity to the authentication process. Handling expiring tokens would have been necessary.
-
Limited Focus: The API is solely focused on container registry objects, which does not align with the goal of creating a flexible integration framework for adopting additional GAR artifacts (e.g. package registry formats) down the road.
-
Discoverability Limitations: The API has severe limitations when it comes to discoverability, lacking features like filtering or sorting.
-
Multiple Requests: To retrieve all the required information about each image, multiple requests to different endpoints (listing tags, obtaining image manifests, and image configuration blobs) would have been necessary, leading to a 1+N
performance issue.
GitLab had previously faced significant challenges with the last two limitations,
prompting the development of a custom
GitLab container registry API
to address them. Additionally, GitLab decided to
deprecate support
for connecting to third-party container registries using the Docker/OCI API due to
these same limitations and the increased cost of maintaining two solutions in
parallel. As a result, there is an ongoing effort to replace the use of the
Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.
Considering these factors, the decision was made to build the GAR integration
from scratch using the proprietary GAR API. This approach provides more flexibility
and control over the integration and can serve as a foundation for future expansions,
such as support for other GAR artifact formats.
Client SDK
To interact with GAR we will make use of the official GAR Ruby client SDK.
By default, this client will use the RPC version of the Artifact Registry API.
To build the client, we will need the service account key.
Interesting functions
For the scope of this blueprint, we will need to use the following functions from the Ruby client:
Limitations
Filtering is not available in #list_docker_images
. In other words, we can’t filter the returned list (for example on a specific name). However, ordering on some columns is available.
Structure and Organization
Unlike the GitLab container registry (and therefore the Docker Registry and OCI Distribution), GAR does not treat tags as the primary “artifacts” in a repository. Instead, the primary “artifacts” are the image manifests. For each manifest object (represented by DockerImage
), there is a list of assigned tags (if any). Consequently, when listing the contents of a repository through the GAR API, the response comprises a collection of manifest objects (along with their associated tags as properties), rather than a collection of tag objects. Additionally, due to this design choice, untagged manifests are also present in the response.