AI Context Management

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed dmishunov jessieay bvenker dmishunov devops data-stores 2023-06-03

Glossary

  • AI Context. In the scope of this technical blueprint, the term “AI Context” refers to supplementary information provided to the AI system alongside the primary prompts.
  • AI Context Policy. The “AI Context Policy” is a user-defined and user-managed mechanism allowing precise control over the content that can be sent to the AI as contextual information. In the context of this blueprint, the AI Context Policy is suggested as a YAML configuration file.
  • AI Context Policy Management. Within this blueprint, “Management” encompasses the user-driven processes of creating, modifying, and removing AI Context Policies according to specific requirements and preferences.
  • Automatic AI Context. AI Context, retrieved automatically based on the active document. *Automatic AI Contex can be the active document’s dependencies (modules, methods, etc., imported into the active document), some search-based, or other mechanisms over which the user has limited control.
  • Supplementary User Context: User-defined AI Context, such as open tabs in IDEs, local files, and folders, that the user provides from their local environment to extend the default AI Context.
  • AI Context Retriever: A backend system capable of:
    • communicating with AI Context Policy Management
    • fetching content defined in Automatic AI Context and Supplementary User Context (complete files, definitions, methods, etc.), based on the AI Context Policy Management
    • correctly augment the user prompt with AI Context before sending it to LLM. Presumably, this part is already handled by AI Gateway.
  • Project Administrator. In the context of this blueprint, “Project Administrator” means any individual with the “Edit project settings” permission (“Maintainer” or “Owner” roles, as defined in Project members permissions).

Illustration of the AI Context architecture

Summary

Correct context can dramatically improve the quality of AI responses. This blueprint aims to accommodate AI Context seamlessly into our offering by architecting a solution that is ready for this additional context coming from different AI features.

However, we recognize the importance of security and trust, which automatic solutions do not necessarily provide. To address any concerns users might have about the content fed into the AI Context, this blueprint suggests providing them with control and customization options. This way, users can adjust the content according to their preferences and have a clear understanding of what information is being utilized.

This blueprint proposes a system for managing AI Context at the Project Administrator and individual user levels. Its goal is to allow Project Administrator to set high-level rules for what content can be included as context for AI prompts while enabling users to specify Supplementary User Context for their prompts. The global AI Context Policy will use a YAML configuration file format stored in the same Git repository. The suggested format of the YAML configuration files is discussed below.

Motivation

Ensuring the AI has the correct context is crucial for generating accurate and relevant code suggestions or responses. As the adoption of AI-assisted development grows, it’s essential to give organizations and users control over what project content is sent as context to AI models. Some files or directories may contain sensitive information that should not be shared. At the same time, users may want to provide additional context for their prompts to get more relevant suggestions. We need a flexible AI Context management system to handle these cases.

Goals

For Project Administrators

  • Allow Project Administrators set the default AI Context Policy to control whether content can or cannot be automatically included in the AI Context when making requests to LLMs
  • Allow Project Administrators to specify exceptions to the default AI Context Policy
  • Provide a UI to manage the default AI Context Policy and its exceptions list easily

For users

  • Allow to set Supplementary User Context to include as AI context for their prompts
  • Provide a UI to manage Supplementary User Context easily

Non-Goals

  • AI Context Retriever architecture - different environments (Web, IDEs) will probably implement their retrievers. However, the unified public interface of the retrievers should be considered.
  • Extremely granular controls like allowing/excluding individual lines of code
  • Storing entire file contents from user projects, only paths will be persisted

Proposal

The proposed architecture consists of 3 main parts:

  • AI Context Retriever
  • AI Context Policy Management
  • Supplementary User Context

There are several different ongoing efforts related to various implementations of AI Context Retriever both for Web, and for IDEs. Because of that, the architecture for AI Context Retriever is beyond the scope of this blueprint. However, in the context of this blueprint, it is assumed that:

  • AI Context Retriever is capable of automatically retrieving and fetching Automatic AI Context and passing it on as AI Context to LLM.
  • AI Context Retriever can automatically retrieve and fetch _Supplementary User Context_and pass it on as AI Context to LLM.
  • AI Context Retriever implementation can ensure that any content passed as AI Context to a model adheres to the global AI Context Policy.
  • AI Context Retriever can trim the AI Context to meet the contextual window requirement for a specific LLM used for that or another Duo feature.

AI Context Policy Management proposal

To implement the AI Context Policy Management system, it is proposed to:

  • Introduce the YAML file format for configuring global policies
  • In the YAML configuration file, support two ai_context_policy types:
    • block: blocks all content except for the specified exclude paths. Excluded files are allowed. (Default)
    • allow: allows all content except for the specified exclude paths. Excluded files are blocked.
    • version: specifies the schema version of the AI context file. Starting with version: 1. If omitted treated as the latest version known to the client.
  • In the YAML configuration file, support glob patterns to exclude certain paths from the global policy
  • Support nested AI Context Policies to provide a more granular control of AI Context in sub-folders. For example, a policy in /src/tests would override a policy in /src, which, in its turn, would override a global AI Context Policy in /.

Supplementary User Context proposal

To implement the Supplementary User Context system, it is proposed to:

  • Introduce user-level UI to specify Supplementary User Context for prompts. A particular implementation of the UI could differ in different environments (IDEs, Web, etc.), but the actual design of these implementations is beyond the scope of this architecture blueprint
  • The user-level UI should communicate to the user what is in the Supplementary User Context at any moment.
  • The user-level UI should allow the user to edit the contents of the Supplementary User Context.

Optional steps

  • Provide UI for Project Administrators to configure global AI Context Policy. Source Editor can be used as the editor for this type of YAML file format, similar to the Security Policy Editor.
  • Implement a validation mechanism for AI Context Policies to somehow notify the Project Administrators in case of the invalid format of the YAML configuration file. It could be a job in CI. But to catch possible issues proactively, it is also advised to introduce the validation step as part of the pre-push static analysis

Design and implementation details

  • YAML Configuration File Format: The proposed YAML configuration file format for defining the global AI Context Policy is as follows:

    ai_context_policy: [allow|block]
    
    exclude:
    - glob/**/pattern
    

    The ai_context_policy section specifies the current policy for this and all underlying folders in a repo.

    The exclude section specifies the exceptions to the ai_context_policy. Technically, it’s an inversion of the policy. For example, if we specify foo_bar.js in exclude:

    • for the allow policy, it means that foo_bar.js will be blocked
    • for the block policy, it means that foo_bar.js will be allowed
  • User-Level UI for Supplementary User Context: The UI for specifying Supplementary User Context for prompts can be implemented differently depending on the environment (IDEs, Web, etc.). However, the implementation should ensure users can provide additional context for their prompts. The specified Supplementary User Context for each user can be stored as:

    • a preference stored in the user profile in GitLab

      • Pros: Consistent across devices and environments (Web, IDEs, etc.)
      • Cons: Additional work in the monolith, potentially a lot of new read/writes to a database
    • a preference stored in the local IDE/Web storage

      • Pros: User-centric, local to user environment
      • Cons: Different implementations for different environments (Web, IDEs, etc.), doesn’t survive switching environment or device

In both cases, the storage should allow the preference to be associated with a particular repository. Factors like data consistency, performance, and implementation complexity should guide the decision on what type of storage to use.

  • To mitigate potential performance and scalability issues, it would make sense to keep AI Context Retriever, and AI Context Policy Management in the same environment as the feature needing those. It would be Language Server for Duo features in IDEs and different services in the monolith for Duo features on the Web.

Data flow

Here’s the draft of the data flow demonstrating the role of AI Context using the Code Suggestions feature as an example.

sequenceDiagram
 participant CS as Code Suggestions
 participant CR as AI Context Retriever
 participant PM as AI Context Policy Management
 participant LLM as Language Model

 CS->>CR: Request Code Suggestion
 CR->>CR: Retrieve Supplementary User Context list
 CR->>CR: Retrieve Automatic AI Context list
 CR->>PM: Check AI Context against Policy
 PM-->>CR: Return valid AI Context list
 CR->>CR: Fetch valid AI Context
 CR->>LLM: Send prompt with final AI Context
 LLM->>LLM: Generate code suggestions
 LLM-->>CS: Return code suggestions
 CS->>CS: Present code suggestions to the user

In case the AI Context Retriever fails to fetch any content from the AI Context, the prompt is sent with AI Context, which was successfully fetched. In a low-probability case, when AI Context Retriever cannot fetch any content, the prompt should be sent out as-is.

Alternative solutions

JSON Configuration Files

  • Pros: Widely used, easier integration with web technologies.
  • Cons: Less readable compared to YAML for complex configurations.

Database-Backed Configuration

  • Pros: Centralized management, dynamic updates.
  • Cons: Not version controlled.

Environment Variables

  • Pros: Simplifies configuration for deployment and scaling.
  • Cons: Less suitable for complex configurations.

Policy as Code (without YAML)

  • Pros: Better control and auditing with versioned code.
  • Cons: It requires users to write code and us to invent a language for it.

Policy in .ai_ignore and other Git-like files

  • Pros: Provides a straightforward approach, identical to the allow policy with the list of exclude suggested in this blueprint
  • Cons: Supports only the allow policy; the processing of this file type still has to be implemented

Based on these alternatives, the YAML file was chosen as a format for this blueprint because of versioning in Git, and more versatility compared to the .ai_ignore alternative.

Suggested iterative implementation plan

Please refer to the Proposal for a detailed explanation of the items in every iteration.

Iteration 1

  • Introduce the global .ai-context-policy.yaml YAML configuration file format and schema for this file type as part of AI Context Policy Management.
  • AI Context Retrievers introduce support for Supplementary User Context.
  • Optional: validation mechanism (like CI job and pre-push static analysis) for .ai-context-policy.yaml

Success criteria for the iteration: Prompts sent from the Code Suggestions feature in IDEs contain AI Context only with the open IDE tabs, which adhere to the global AI Context Policy in the root of a repository.

Iteration 2

  • In AI Context Retrievers introduce support for Automatic AI Context.
  • Connect more features to the AI Context Management system.

Success criteria for the iteration: Prompts sent from the Code Suggestions feature in IDEs contain AI Context with items of Automatic AI Context, which adhere to the global AI Context Policy in the root of a repository.

Iteration 3

  • Connect all Duo features on the Web and in IDEs to AI Context Retrievers and adhere to the global AI Context Policy.

Success criteria for the iteration: All Duo features in all environments send AI Context which adheres to the global AI Context Policy

Iteration 4

  • Support nested .ai-context-policy.yaml YAML configuration files.

Success criteria for the iteration: AI Context Policy placed into the sub-folders of a repository, override higher-level policies when sending prompts.

Iteration 5

  • User-level UI for Supplementary User Context.

Success criteria for the iteration: Users can see and edit the contents of the Supplementary User Context and the context is shared between all Duo features within the environment (Web, IDEs, etc.)

Iteration 6

  • Optional: UI for configuring the global AI Context Policy.

Success criteria for the iteration: Users can see and edit the contents of the AI Context Policies in a UI editor.


AI Context Management ADR 001: Keeping AI Context Policy Management close to AI Context Retriever

Summary

To manage AI Context effectively and ensure flexible and scalable solutions, AI Context Policy Management will reside in the same environment, as the AI Context Retriever, and, as a result, as close to the context fetching mechanism as possible. This approach aims to reduce latency and improve user control over the contextual information sent to AI systems.

Context

The original blueprint outlined the necessity of a flexible AI Context Management system to provide accurate and relevant AI responses while addressing security and trust concerns. It suggested that AI Context Policy Management should act as a filtering solution between the context resolver and the context fetcher in the AI Context Retriever. However, the blueprint did not specify the exact location for the AI Context Policy Management within the system.

Last modified November 1, 2024: Remove trailing spaces (6f6d0996)