Self-Hosted Model Deployment

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed sean_carroll eduardobonet jessieay susie.bee m_gill devops ai-powered 2024-03-29

This Blueprint describes support for customer self-deployments of Mistral LLMs as a backend for GitLab Duo features, as an alternative to the default Vertex or Anthropic models offered on GitLab Dedicated and .com. This initiative supports both internet connected and air-gapped GitLab deployments.

Motivation

Self-hosted LLM models allow customers to manage the end-to-end transmission of requests to enterprise-hosted LLM backends for GitLab Duo features, and keep all requests within their enterprise network. GitLab provides as a default LLM backends of Google Vertex and Anthropic, hosted externally to GitLab. GitLab Duo feature developers are able to access other LLM choices via the AI Gateway. More details on model and region information can be found here.

Goals

Self-Managed models serve sophisticated customers capable of managing their own LLM infrastructure. GitLab provides the option to connect supported models to LLM features. Model-specific prompts and GitLab Duo feature support is provided by the self-hosted models feature.

  • Choice of LLM models
  • Ability to keep all data and request/response logs within their own domain
  • Ability to select specific GitLab Duo Features for their users
  • Non-reliance on the .com AI Gateway

Non-Goals

Other features that are goals of the Custom Models group and which may have some future overlap are explicitly out of scope for the current iteration of this blueprint. These include:

  • Local Models
  • RAG
  • Fine Tuning
  • GitLab managed hosting of open source models, other than the current supported third party models.

Proposal

GitLab will provide support for specific LLMs hosted in a customer’s infrastructure. The customer will self-host the AI Gateway, and self-host one or more LLMs from a predefined list. Customers will then configure their GitLab instance for specific models by LLM feature. A different model can be chosen for each GitLab Duo feature.

This feature is accessible at the instance-level and is intended for use in GitLab Self-Managed instances.

Self-hosted model deployment is a GitLab Duo Enterprise Add-on.

Design and implementation details

Component Architecture

graph LR
    a1 --> c1
    a2 --> b1
    b1 --> c1
    b3 --> b1
    b4 --> b1
    c1 --> c2
    c2 --> c3
    c3 --> d1
    d1 --> d2

    subgraph "User"
      a1[IDE Request]
      a2[Web / CLI Request]
    end

    subgraph "Self-Managed GitLab"
      b1[GitLab Duo Feature] <--> b2[Model & Feature-specific<br/>Prompt Retrieval]
      b3[GitLab Duo Feature<br/>Configuration]
      b4[LLM Serving Config]
    end

    subgraph "Self-Hosted AI Gateway"
      c1[Inbound API interface]
      c2[Model routing]
      c3[Model API interface]
    end

    subgraph "Self-Hosted LLM"
      d1[LoadBalancer]
      d2[GPU-based backend]
    end

Diagram Notes

  • User request: A GitLab Duo Feature is accessed from one of three possible starting points (Web UI, IDE or Git CLI). The IDE communicates directly with the AI Gateway.
  • LLM Serving Config: The existence of a customer-hosted model along with its connectivity information is declared in GitLab Rails and exposed to the AI Gateway with an API.
  • GitLab Duo Feature Configuration: For each supported GitLab Duo feature, a user may select a supported model and the associated prompts are automatically loaded.
  • Prompt Retrieval: GitLab Rails chooses and processes the correct prompt(s) based on the GitLab Duo Feature and model being used
  • Model Routing: The AI Gateway routes the request to the correct external AI endpoint. The current default for GitLab Duo features is either Vertex or Anthropic. If a Self-Managed model is used, the AI Gateway must route to the correct customer-hosted model’s endpoint. The customer-hosted model server details are the LLM Serving Config and retrieved from GitLab Rails as an API call. They may be cached in the AI Gateway.
  • Model API interface: Each model serving has its own endpoint signature. The AI Gateway needs to be able to communicate using the right signature. We will support commonly supported model serving formats such as the OpenAI API spec.

Configuration

Configuration is set at the GitLab instance-level; for each GitLab Duo feature a drop-down list of options will be presented. The following options will be available:

  • Self-hosted model 1
  • Self-hosted model n
  • Feature Inactive

In the initial implementation a single self-hosted Model will be supported, but this will be expanded to a number of GitLab-defined models.

AI Gateway Deployment

Customers will be required to deploy a local instance of the AI Gateway in their own infrastructure. The AI Gateway can be installed using:

The AI Gateway container is published to the GitLab Container Registry and DockerHub on every GitLab Release.

Prompt Support

For each supported model and supported GitLab Duo feature, prompts will be developed and evaluated by GitLab. They will be baked into the Rails Monolith source code.

When the standard prompts are migrated into either the AI Gateway or a prompt template repository (direction is to be determined), the prompts supporting self-hosted models will also be migrated.

LLM Hosting Support

Self-Hosted models are supported running as on-premises on customer internal infrastructure or in a private space on cloud providers:

Specific model support by cloud provider is listed below. The GitLab AI Gateway also needs to be installed, and the Docker container is accessible on DockerHub and the GitLab Container Registry.

For details on what Duo features are supported by model, see this documentation page

Self-Hosted inference

Model Availability
Mistral 7B Beta
Mixtral 8x7B Beta
Mixtral 8x7B Instruct Beta
Mixtral 8x22B Beta
Codestral 22B Beta
CodeGemma 2B Beta
CodeGemma 7B-code Beta
Code-Llama 13B Beta
DeepSeek Coder 33B Instruct Beta
DeepSeek Coder 33B Base Beta

Inference on AWS Bedrock

Model Availability
Mistral 7B Beta
Mixtral 8x7B Beta
Mixtral 8x7B Instruct Beta
Mixtral 8x22B Beta
Codestral 22B Beta
Claude 3.5 Sonnet Beta
Claude 3 Haiku Beta

Inference on Microsoft Azure

Model Availability
OpenAI 4o Beta

Installation instructions are available in the Developer documentation.

RAG / Duo Chat tools

Most of the tools available to Duo Chat behave the same for self-hosted models as they do in the GitLab-hosted AI Gateway architecture. Below are the expections:

Duo documentation search performed through the GitLab-managed AI Gateway (cloud.gitlab.com) relies on VertexAI Search, which is not available for air-gapped customers. As a replacement, only within the scope of self-hosted, air-gapped customers, an index of GitLab documentation has been provided within the self-hosted AI Gateway.

This index is an SQLite database that allows for full-text search. An index is generated for each GitLab version and saved into a generic package registry. The index that matches the customer’s GitLab version is then downloaded by the self-hosted AI Gateway.

Using a local index does bring some limitations:

  • BM25 search performs worse in the presence of typos, and the performance also depends on how the index was built
  • Given that the indexed tokens depend on how the corpus was cleaned (stemming, tokenisation, punctuation), the same text cleaning steps need to be applied to the user query for it to properly match the indexes
  • Local search diverges from other already implemented solutions, and creates a split between self-managed and GitLab-hosted instances of the AI Gateway.

Over time, we intend to replace this solution with a self-hosted Elasticsearch/OpenSearch alternative, but as of now, the percentage of self-hosted customers that have Elasticsearch enabled is low.

For further discussion, refer to the proof of concept.

Index creation

Index creation and implementation is being worked on as part of this epic

Evaluation

Evaluation of the local-search is being worked on as part of this epic.

LLM-hosting

Customers will self-manage LLM hosting. We provided limited documentation on how customers can host their own LLMs

GitLab Duo License Management

The Self-Managed GitLab Rails will self-issue a token (same process as for .com) that the local AI Gateway can verify, to guarantee that cross-service communication is secure. Details

System Architectures

At this time a single system architecture only is supported. See the Out of Scope section for discussion on alternatives.

Self-Managed GitLab with self-hosted AI Gateway

This system architecture supports both a internet-connected GitLab and AI Gateway, or can be run in an air-gapped environment. Customers install a self-managed AI Gateway within their own infrastructure. The long-term vision for such installations is via Runway, but until that is available a Docker-based install will be supported.

Self-Managed customers who deploy a self-managed AI Gateway will only be able to access self-hosted models at this time. Future work around Bring Your Own Key may change that in the future.

Development Environment

Engineering documentation will be produced on how to develop this feature, with work in progress on:

Out of scope

  • It would be possible to support customer self-hosted models within a customer’s infrastructure for dedicated or .com customers, but this is not within scope at this time.
  • Support for models other than those listed in the Supported LLMs section above.
  • Support for modified models.

Out of scope System Architectures

There are no plans to support these system architectures at this time, this could change if there was sufficient customer demand.

Self-Managed GitLab with .com AI Gateway

In this out-of-scope architecture a self-managed customer continues to use the .com hosted AI gateway, but points back to self-managed models.

.com GitLab with .com AI Gateway

In this out-of-scope architecture .com customers point to self-managed models. This topology might be desired if there were better quality of results for a given feature by a specific model, or if customers could improve response latency by using their own model-serving infrastructure.

GitLab Dedicated

Support will not be provided for Dedicated customers to use a self-hosted AI Gateway and self-hosted models. Dedicated customers who use GitLab Duo features can access them via the .com AI Gateway. If there is customer demand for self-managed models for Dedicated customers, this can be considered in the future.

Last modified October 23, 2024: Move models back to Beta (5e1b8cbf)