Cells: Organization migration

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed dbalexandre mkozono ayufan sxuereb sranasinghe luciezhao sranasinghe luciezhao devops tenant scale 2024-05-01

Summary

All user data will be wrapped in an Organization which provides isolation and enables moving an organization from one Cell to another, especially from the Legacy Cell.

In Protocells, we will define cohorts consisting of top-level groups that can be moved to an organization and then migrated to a Cell.

Defining the cohorts is the first part of work, but we also need to build tooling to move organizations from the Legacy Cell to a Cell.

This design document focuses on the migration tooling that moves an organization from source to destination. It only mentions Cohorts and Top Level Group migration and doesn’t go into implementation detail of those.

Motivation

Cells is only successful if we meet its primary goal to horizontally scale GitLab.com. For us to scale we need to move the existing data we have on Legacy Cell to new Cells to permanently remove load before we hit database scaling limits. This migration capability is essential for future-proofing our GitLab.com services as GitLab grows.

Goals

  1. Interruptible: If a migration is interrupted like a compute failure or stopped by an operator it should start where it left off.
  2. Hands Off: The migration should run in the background, and we shouldn’t have a team member laptop running the migration.
  3. Code Reuse: Geo was built to replicate data from one GitLab instance to another, we are doing the same but it’s on an organization level.
  4. No Data Loss: All data that lives on the source Cell should be available on the destination Cell. This means that we have account for all data types such as Object Storage, Postgres, Advanced Search, Exact Code Search, Git, and Container Registry.
  5. No Cell Downtime: When migrating an organization the source Cell and destination Cell shouldn’t incur any downtime except for the organization being transferred.
  6. No Visible Downtime: The organization should not realize that we are migrating their data. We will never get zero downtime and we will start with some downtime/read-only but will continuously improve this the higher profile customers we migrate.
  7. Large Organizations Support: Able to migrate terabytes of data in a timely fashion. This means we have to make our tooling scalable to the data.
  8. Concurrency: Able to migrate multiple organizations at the same time without affecting one another.
  9. Cell Local: A migration should happen on the destination Cell to prevent a single point of failure for all migrations.
  10. Minimal Throwaway Work: We should iterate on the migration tooling instead of re-writing it multiple times.
  11. Observability: At any point in time we need to know where the migration is at, and if there are any problems.
  12. Cell Aware: The migration tooling needs to also update information in Topology Service to start routing requests to the correct Cell.
  13. No User Visible Performance Impact: Migration should not degrade performance for neither the source or destination Cell.
  14. Rollback Capability: If we need to migrate an organization back to the source destination this should be possible.
  15. Dry Run Support: Operators should be able to test migrations with validation and time estimates without actually moving data.
  16. Security: All data in transit should be encrypted, and cross-cell communication must use proper authentication and authorization.

Non-Goals

  • The decision of which organization lives in which cell.
  • Support for self-hosted installations.
  • Be a replacement to any disaster recovery tooling.

Cohort Definitions

To satisfy the Protocells exit criteria, it is expected that we will need to migrate a substantial portion of the top 1,000 active namespaces, which consumes about 67% of database time.

A cohort is a set of GitLab root namespaces and their data, selected as a single collection to incrementally transfer/migrate to other cells.

Cohort Naming Convention: We use 0 for the test cohort because it must complete successfully before we proceed to production cohorts. Subsequent cohorts (A, B, C, etc.) use letters to indicate they can be executed in parallel without sequential dependencies.

Cohort ID Cohort Name Cohort size indication Purpose Simplified eligibility criteria Impact on Exit criteria
Cohort 0 Test cohort Up to 100 orgs Use test namespaces to test the transfer & migration process from end-to-end None
Cohort A Subset of Inactive Free users Up to 5,000 orgs To establish Protocells as part of real, production use, and refine the migration process. - Inactive root namespaces

- Free plan

- Private only
Tiny impact on database size
Cohort B Active opt-in Beta Up to 1000 orgs Gain experience with real daily active users. - Opt-in / guided

- Active root namespaces

- Free, or paid

- Private only
Tiny impact on WAL, LWLocks and database size
Cohort C Top 1000 opt-in Up to 300 orgs Relieve the legacy cell - Opt-in / guided

- Top 1000 root namespaces by database time

- Private only

- Prerequisite: Feature parity
At least 20% [1] decrease in WAL saturation, and Database size
Cohort D Active long tail opt-in Approximately 10,000 orgs Relieve the legacy cell - Opt-in / self-service

- Active root namespaces

- Private only

- Prerequisite: Feature parity

- Free, or paid
At least 10% [2] decrease in WAL saturation, and Database size
  • [1]: The 20% target is derived from 1/3 × 67% database time consumed by the top 1000 namespaces.
  • [2]: The 10% target comes from potentially moving 1/3 of 33% of long tail database time.

Migration Design Documentation

  1. Migration documents (DMS, Cohorts, etc) will be added here

Organization Data Migration: DMS Integration with Dedicated Tooling
Summary This design document proposes integrating AWS Database Migration Service (DMS) into GitLab …