Vertex AI Search
Retrieve GitLab Documentation
- Statistics (as of January 2024):
- Date type: Markdown (Unstructured) written in natural language
- Date access level: Green (No authorization required)
- Data source:
https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc
- Data size: approx. 56,000,000 bytes. 2194 pages.
- Service:
https://docs.gitlab.com/
(source repo - Example of user input: “How do I create an issue?”
- Example of expected AI-generated response: “To create an issue:\n\nOn the left sidebar, select Search or go to and find your project.\n\nOn the left sidebar, select Plan > Issues, and then, in the upper-right corner, select New issue.”
The GitLab documentation is the SSoT service to serve GitLab documentation for SaaS (both GitLab.com and Dedicated) and Self-managed.
When a user accesses to a documentation link in GitLab instance,
they are redirected to the service since 16.0 (except air-gapped solutions).
In addition, the current search backend of docs.gitlab.com
needs to transition to Vertex AI Search. See this issue (GitLab member only) for more information.
We introduce a new semantic search API powered by Vertex AI Search for the documentation tool of GitLab Duo Chat.
Setup in Vertex AI Search
We create a search app for each GitLab versions. These processes will likely be automated in the GitLab Documentation project by CI/CD pipelines.
- Create a new Bigquery table e.g.
gitlab-docs-latest
orgitlab-docs-v16.4
- Download documents from repositories (e.g.
gitlab-org/gitlab/doc
,gitlab-org/gitlab-runner/docs
,gitlab-org/omnibus-gitlab/doc
). - Split them by Markdown headers and generate metadata (e.g. URL and title).
- Insert rows into the Bigquery table.
- Create a search app
See this notebook for more implementation details. The data of the latest version will be refreshed by a nightly build with Data Store API.
AI Gateway API
API design is following the existing patterns in AI Gateway.
POST /v1/search/docs
{
"type": "search",
"metadata": {
"source": "GitLab EE",
"version": "16.3" // Used for switching search apps for older GitLab instances
},
"payload": {
"query": "How can I create an issue?",
"params": { // Params for Vertex AI Search
"page_size": 10,
"filter": "",
},
"provider": "vertex-ai"
}
}
The response will include the search results. For example:
{
"response": {
"results": [
{
"id": "d0454e6098773a4a4ebb613946aadd89",
"content": "\nTo create an issue from a group: \n1. On the left sidebar, ...",
"metadata": {
"Header1": "Create an issue",
"Header2": "From a group",
"url": "https://docs.gitlab.com/ee/user/project/issues/create_issues.html"
}
}
]
},
"metadata": {
"provider": "vertex-ai"
}
}
See SearchRequest and SearchResponse for Vertex AI API specs.
Proof of Concept
- GitLab-Rails MR
- AI Gateway MR
- Vertex AI Search service
- Google Colab notebook
- Demo video (Note: In this video, Website URLs are used as data source).
Evaluation score
Here is the evaluation scores generated by Prompt Library.
Setup | correctness | comprehensiveness | readability | evaluating_model |
---|---|---|---|---|
New (w/ Vertex AI Search) | 3.7209302325581382 | 3.6976744186046511 | 3.9069767441860455 | claude-2 |
Current (w/ Manual embeddings in GitLab-Rails and PgVector) | 3.7441860465116279 | 3.6976744186046511 | 3.9767441860465116 | claude-2 |
Dataset
- Input Bigquery table:
dev-ai-research-0e2f8974.duo_chat_external.documentation__input_v1
- Output Bigquery table:
dev-ai-research-0e2f8974.duo_chat_external_results.sm_doc_tool_vertex_ai_search
dev-ai-research-0e2f8974.duo_chat_external_results.sm_doc_tool_legacy
- Command:
promptlib duo-chat eval --config-file /eval/data/config/duochat_eval_config.json
Estimated Time of Completion
- Milestone N:
- Setup in Vertex AI Search with CI/CD automation.
- Introduce
/v1/search/docs
endpoint in AI Gateway. - Updates the retrieval logic in GitLab-Rails.
- Feature flag clean up.
Total milestones: 1
e47101dc
)