Global Search Group

The Global Search team is focused on bringing world class search functionality to GitLab.com and self-managed instances.

Vision

The Global Search Group focuses on bringing world class search functionality to GitLab.com and self-managed instances.

This page covers processes and information specific to the Global Search group. See also the Global Search and Code Search direction pages.

Mission

The group is responsible for improving and expanding upon our current global search implementations using Elasticsearch, PostgreSQL, and Gitaly. Areas of responsibility will include global search functionality, UI, ingestion mechanisms, optimal indexing, administrative tools, and installation mechanisms for self-managed installations.

Additionally, we will support AI features via Retrieval Augmented Generation work which includes:

  • Identifying and preparing new useful data for our AI-powered features in collaboration with feature teams and the AI Framework team
  • Storing vector embeddings of epics, issues, MRs, source code, and more
  • Providing retrieval APIs for those vector embeddings, metadata filtering, and ensuring permissions are enforced

This team doesn’t own custom searches for specific features, such as the “filter bar” on issues which is part of the Issue Tracking category owned by the Project Management group.

Team Members

The following team members are permanent members of the Global Search Group:

Name Role
Changzheng LiuChangzheng Liu Backend Engineering Manager, Global Search
Arturo HerreroArturo Herrero Staff Backend Engineer, Global Search
Dmitry GruzdDmitry Gruzd Staff Backend Engineer, Global Search
John MasonJohn Mason Senior Backend Engineer, Global Search
Madelein van NiekerkMadelein van Niekerk Senior Backend Engineer, Global Search
Ravi KumarRavi Kumar Backend Engineer, Global Search
Siddharth DungarwalSiddharth Dungarwal Backend Engineer, Global Search
Terri ChuTerri Chu Staff Backend Engineer, Global Search
Tomáš BulvaTomáš Bulva Senior Frontend Engineer, Global Search

Stable Counterparts

The following members of other functional teams are our stable counterparts:

Name Role
Ben Venker Senior Product Manager, Global Search
Ashraf Khamis Senior Technical Writer
Cleveland Bledsoe Jr Senior Support Engineer
Brenda Nyaringita Support Engineer(EMEA)

Shared Responsibilities

The Global Search team shares responsibilites with the AI Framework team in the area of Retrieval Augmented Generation (RAG). Specifically, we will collaborate in the data preparation stage and information retrieval stage of the RAG process.

Meetings

Whenever possible, we prefer to communicate asynchronously using issues, merge requests, and Slack. However, face-to-face meetings are useful for establishing a personal connection and addressing items that would be more efficiently discussed synchronously, such as blockers.

  • The Global Search Group meets weekly on Tuesdays at 14:00 UTC.
  • The Global Search Group also has an Open Discussion Hour on Thursdays at 12:30 UTC

Work

We follow the general workflow and principles defined in Product Development Flow and Engineering Workflow. To bring an issue to our attention, please create an issue in the relevant project. Add the ~"group::global search" label and any other suitable labels. If it is an urgent issue, please reach out to the Product Manager or Engineering Manager listed in the Stable Counterparts section above.

Below are a few guidelines the team follows in the day-to-day work.

  • We use asynchronous communication with each other and with other GitLab teams via GitLab, Slack, Google Docs, etc.
  • We have weekly team meetings, 1-on-1 meetings, and virtual happy hours via Zoom to discuss various topics and create team bonding.
  • We encourage all backend engineers in our team to have their changes reviewed by someone else in our group. It’s great for knowledge sharing.
  • We organize our tasks under Epics and Issues. The Product Manager and Engineering Manager go through the backlog at the planning phase of each release and put issues into the next one or two milestones. The issues on the milestone board are sorted based on priority. The higher priority issues are placed on the top.
  • We apply the Deliverable label to the issues that we intend to close in a milestone before the milestone starts. Issues added during a milestone should not have the Deliverable label applied. We review these issues in the middle of the milestone, usually the first week of each month. We will remove the Deliverable label from the issues that are not likely to make it into the release.
  • We apply the Stretch label to the issues that we intend to start during a milestone but are not committing to closing.
  • We work with the UX team for features that need their design input by labeling the issues with a UX workflow label and adding the corresponding UX team counterpart as the assignee. We use workflow::problem validation and workflow::solution validation for user research and workflow::design for UI design and prototyping. Once the design is finished, workflow::ready for development label will be added as an indicator that development can start. For minor UX/UI changes, we contact our UX counterpart or the Product Design Manager to request a review for fast iterations.
  • We work with the Quality team for issues that require input from a testing perspective by labeling the issues with workflow::planning breakdown and adding the SET counterpart as an assignee. Once SET reviews the issue, they acknowledge back with the label quad-planning::complete-action or quad-planning::complete-no-action
  • We work with the Technical Writing team for issues that need documentation change by labeling the issues with documentation and adding our counterpart in the Technical Writing team as assignee. Our technical writer helps us update the corresponding document. The documentation change normally happens together with the code change.
  • We work with our stable counterpart in the Security team for issues that need input from a security perspective. We suggest using team planning issues, for example, this one, for communication.
  • We work with the Support Engineering team by collaborating on issues directly. We invite our counterpart in the Support Engineering team to our team meeting every month to have direct communication.
  • When team members are ready for their next tasks, they will pick an issue from the milestone board and become the issue owner by assigning the issue to themselves. Team members should prioritize issues with the Deliverable label. The issue owner will be responsible for finding the solution to the issue. They can propose a solution by opening a Merge Request. They can also break down the issue into smaller sub-issues if it makes sense to take an iterative approach.
  • Before going out of office for an extended time, assign items still in review to the Engineering Manager. The Engineering Manager can reassign as needed.
  • Whenever a team member reviews an author’s work that is out of office for an extended time, they are welcome to complete the changes requested if they deem themselves comfortable with the remainder of the work.
  • We review and prioritize bugs every week. It is common for bugs to represent the problem without identifying the impact. Because the Product Management and QA share the responsibility of assessing every bug for priority, severity, and details. Severity uses an approximation of the Risk Matrix to identify potential risk and frequency. Priority is based on total impact over time. Occasionally, something of a lower priority/ severity will be added to a milestone when it relates to work currently scheduled.
    1. Review all new bugs for content, priority, severity, and milestones
    2. Review any bugs missing priority or severity
    3. Prioritize bugs for the current milestone. 10% of scheduled work should be focused on bugs
    4. Schedule bugs for future milestones based on capacity, severity, priority, and relationship to any scheduled work

Breaking changes process

Before a major milestone starts, we prepare an epic with all the breaking change issues linked. As usual, we work to get approvals but keep the MR in draft to prevent it from merging before the major milestone. If an MR is independent, we can have the master as a target branch. If not, we can have a sequence of MRs with target branches set to each other. As soon as the first one merges, the next will automatically target master.

Every MR that was created before the breaking change milestone should have this or a similar warning in the description: :warning: This MR must be kept as a draft and cannot be merged until **DATE** :warning:

Bugfix backport process

We review the bugfix merge requests every week. To facilitate this process, we have created scoped labels: backport::required, backport::skip, and backport::complete.

  • The backport::skip label will be added to merge requests if no backport is needed.
  • The backport::required label will be added to the merge requests that need to be backported to a previous release in the initial review. The DRI will follow the patch release process to backport the fix to a previous release. Once the backport is done, the backport::complete label will be added to indicate the whole process is complete.

Advanced Global Search Rollout on GitLab.com

The team has been actively working on enabling Elasticsearch powered Advanced Search on GitLab.com. Based on our analysis, we set our first target to roll this feature out for all the paid groups on GitLab.com. You can find more details about the timeline and progress in the links below.

Type of Operation ~severity::1 - Blocker ~severity::2 - Critical ~severity::3 - Major ~severity::4 - Low
Recall Record, Global Above 10 seconds to timing out Between 7 and 10 seconds Between 4 and 7 seconds Between 2 and 4 seconds
Time until inserted record is recallable Above 15 minutes Between 15 and 10 minutes Between 10 and 5 minutes Between 3 and 5 minutes

The two types of operations we detail severity metrics for above are:

  • Recall Record, Global: This is the time it takes to recall a record using a globally scoped search of GitLab.com. Records could be entities such as projects, users, groups, etc.
  • Time until inserted record is recallable: This is the elapsed time between adding a new record and having that new record be recallable via a search. This process depends on many underlying technologies such as the Go indexer, Sidekiq queues, and the Elasticsearch database.

Weighting for Search Issues

We use the Fibonacci rating system to assign weights to Search issues. Below are a few guidelines when setting issue weight:

  • Issues that include ~backend and ~frontend work should have the weights added for a total weight representative of the work effort.
  • Spike issues are assigned a weight to help timebox the effort.
  • Bugs will not be given a weight.
  • Any issue weighted over 5 should be broken down into smaller iterative steps if the issue does not contain ~backend and ~frontend work.
Weight Description
0 No effort or trivial effort (example: Documentation typo or Feature Flag Rollout)
1 Low effort (No Database migrations or Advanced Search migrations)
2 Low-Medium effort
3 Medium effort
5 High effort

MR reviews

We have the following guidelines for doing reviews on Global Search Team MRs:

  • The MR author is responsible for deciding if the initial or maintainer reviews should be done by a Global Search Team member and can indicate that in a comment or by assigning the reviewers.
  • Draft status indicates that the MR is not ready to be merged, but the author could decide to assign a reviewer while in draft mode. Unless a review is urgent, the author should wait for the pipeline to pass before assigning a reviewer.
  • We use Conventional Comments to communicate effectively in review comments.
  • The merge request author resolves only the threads they feel they have fully addressed and all discussions have been closed, anything else is resolved by the reviewer. When a merge request has many threads, it is helpful for the reviewer to go back to open threads to pick up where the previous discussions were left off.

Oncall escalation coverage

As the Global Search Team requires special domain knowledge, such as Elasticsearch, we borrow team members with this domain knowledge from other groups to cover the on-call escalation when we are understaffing, especially during the holiday seasons. In general, we will follow the dev on-call process. The Elasticsearch domain experts, identified by domain_expertise on their profile, may be contacted when SRE and dev on-call engineers cannot resolve the production incidents. We don’t expect the domain experts to work outside their normal working hours. In case of emergency, we will follow the rules and best practices outlined in our Incident Management handbook. To assist team members in catching up on the latest development status and resolving potential incidents, we have created a Global Search Incident Management document as a reference.

Onboard domain experts from other groups to cover production incident escalation

When onboarding domain experts from other groups to help cover production incident escalation, we may consider the following actions:

  • Suggest the team member add elasticsearch as their domain_expertise in their team member profile
  • Add the team member to the Slack group global-search-team which can be used by SREs and other on-call engineers to contact in case of emergency
  • Create the access request for the team member to grant them access permissions to Elasticsearch cluster
  • Schedule walk-through sessions with the team member to go over the latest architecture and development status

Offboard domain experts from production incident escalation coverage

  • Remove the team member from the Slack group global-search-team
  • Revoke the team member’s access permission of Elasticsearch cluster

JTBD

We utilize the Jobs to be Done (JTBD) framework to better understand our customers’ and users’ needs. You can view the current list of our JTBD here.

Performance Testing

We are exploring Rally for performance testing the Elasticsearch cluster. Workload data is determined using Kibana and stored in a Google Sheet (internal)

Resources

Documentations

Blog Posts

Product Demos

Dashboards


Advanced Global Search Rollout on GitLab.com

Steps and Enhancements

Global Search - JTBD
The jobs-to-be-done that the Global Search group is solving for.