Offline Transfer

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed SamWord SamWord devops create 2025-02-26

Summary

This blueprint describes changes to direct transfer to build a new tool, offline transfer, which will allow data to be exported from one GitLab instance and imported to another without a network connection between them.

Currently, direct transfer requires a network connection between the source and destination GitLab instances throughout the migration process. This change would allow GitLab data to be exported on an isolated GitLab instance and manually moved and imported into a destination instance, regardless of the network policies on either end. This change also maintains the functionality and efficiency of direct transfer migrations for those who can take advantage of it.

Current Status

Due to a team reorganization in 18.1, priorities for the Import group have shifted away from this feature for the time being. However, there has been significant collaboration on the topic so far that have spun off several issues to refine this proposal further:

See the merge request that implemented this proposal for all discussions before the shift in team priorities for full context on where this proposal left off: https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/12056.

Once these topics have been addressed in the proposal, if accepted, the issues representing the work needed to implement this proposal can be finalized. Issues that have already been made have the label ~"offline transfer:import" or ~"offline transfer:export".

Glossary of terms

In the past, we’ve had confusion over terms used within our importers. Here’s a glossary of terms to help clear up what some terms mean:

  • Air-gapped network: A network isolated from the public internet. See offline GitLab docs for more information. For the purposes of offline transfers, we should assume that both the source and destination instances are 100% air-gapped and no connection between the source and destination instances can be established. However, customers don’t need to be on an air-gapped network to use this feature.
  • Destination: The instance where group or project data is imported.
  • Entity: A group or project. An entity has many relations, such as milestones, labels, issues, merge requests, etc.
  • Entity prefix: A replacement string for an entity path to avoid extremely long file keys in object storage.
  • Offline transfer: The name of the proposed feature. It is a migration between GitLab instances where the destination instance is unable to make HTTP requests to the source to perform a regular direct transfer.
  • Relation: A resource, generally a model, that belongs to an entity. Relations include things like milestones, labels, issues, merge requests. A relation can also be a self relation (attributes on the entity itself), or more abstract concepts such as user contributions.
  • Source: The source instance where Group or Project data is exported from.

Motivation

Customers with no network connectivity between their GitLab instances or with strict network policies are unable to use direct transfer. These customers must use file-based import/export to manually export and import each group separately. It’s a tedious process and file-based group transfer is deprecated.

Goals

This feature should enable customers with the following restrictions to do convenient, semi-automated migrations:

  • Outside access to or from their network is disallowed
  • Cannot quickly and easily get an IP added to their firewall system
  • Have strict restrictions on what lives on the external machines that can connect to their networks (must have this specific VPN, antivirus, etc)

Additional Requirements

  • Users should be able to use direct transfer API to export groups and projects into an object storage location and to import groups and projects from an object storage location. This should be available first, before UI.
  • Users should be able to use UI to choose groups and projects they want to migrate. That means direct transfer should support migration of many groups and projects at the time.
    • It should be possible to transfer subgroups and projects.
  • The process cannot be fully automated, but it should be straightforward and convenient.
  • Customers with most strict security requirements (“data cannot leave our network”) need to be supported.
  • Support optional encryption.
  • Support offline transfers for customers who are unable to use third party object storage due to organization data policies, etc.

Non-Goals

  • Continuous syncs: direct transfer should support one-off migrations, but not syncing the diffs as the source changes. This is another opportunity requested by customers, but not the focus of this architecture. This work should not prevent it, but this capability is out of scope.
  • Backup/restore: although offline transfer creates what could be considered “export” files, this is not a data backup solution or any other form of disaster recovery.

Nice-to-haves

  • Workarounds for older instances: Older versions of GitLab will not be supported, but workarounds may be possible. See Compatible GitLab versions for more details. As with all changes to direct transfer, older versions should be considered when possible.
  • Supporting an importer platform: This is not a requirement, but something that has been floated as an idea. Now that the importing side of direct transfer needs to support previously exported data, this is an opportunity to allow imports from any source, as long as the exported data fits a particular data schema. It would be ideal to open up this opportunity based on this proposal, but not a requirement the complexity is significantly increased compared to a simpler option.

Proposal

With an offline transfer, the export and import sides of direct transfer will be separated into two sequential steps: export entities to files, then import those files into another instance. The user can manually move the files to an external object storage provider if the source is unable to upload them directly on export. Offline transfers won’t be able to benefit from the same efficiency as online migrations that concurrently import and export entities, but it will allow them to be possible.

Note: The model names, class names, file names, and bounded contexts specified in this document might be changed during implementation. However, the implementation should strive to use the Import bounded context and maintain consistency in naming conventions. Because offline transfer is quite similar to direct transfer, the implementation should also follow naming patterns already used in direct transfer when possible.

Proposed user flow

  1. The user on the source instance calls an offline transfer export API endpoint to export groups and projects to an object storage location.
  2. Groups and projects begin exporting to the provided object storage location. The user can check on the status of the export using another API endpoint.
  3. The user is notified by email that the export has completed.
  4. If the object storage location is not accessible by the destination instance, the user moves the export files over to another object storage location accessible to the destination. This step cannot be automated by offline transfer and may require manual data transfer by the user in accordance with their organization’s data security policies.
  5. On the destination, the user begins an offline transfer import via an API endpoint by providing the accessible object storage configuration and credentials. They also provide a list of paths to entities on the source and a destination name and namespace, exactly the same as starting a direct transfer via the API. All entity relation files for the entities passed in the API parameters must exist in the provided object storage bucket, and their full source paths must be mapped in the metadata file before the import begins. However, not all entities in the bucket need to be imported. Users using disk storage will need an alternative method of making the export data available locally to each Sidekiq node (likely network storage), which is still to be determined in the API call to refer to local disk for exported files.
  6. The bulk import will begin to process using the object storage bucket in place of a connection to the source instance. The the remaining user flow will be the same as an online direct transfer migration.

Note: Details for how customers who cannot use external object storage providers have not been fully defined yet. Further design and limitations for those users will need to be detailed further. For the first iteration, the focus will be on implementing import into and out of an AWS S3-compatible object storage bucket.

Current direct transfer process

This diagram vastly simplifies direct transfer, however it shows how frequently the source and destination instances communicate via network requests. Not all entities download relation files, some make GraphQL queries or REST requests to the source instead. Entities and their relations can be processed concurrently within each stage with online migrations (group stages, project stages).

sequenceDiagram
    participant Source
    participant Destination
    actor Owner
    Owner->>+Destination: Begin an import with given source URL and credentials
    Destination->>+Source: Get source instance details for validation
    Source-->>-Destination: Source version, etc.
    Destination-->Destination: Begin import (async)
    Destination-->>-Owner: Started
    activate Destination
    loop For each imported Entity
        Destination-)Source: Begin exporting entity relations (async)
        loop For each Entity Relation
          Destination->>+Source: Get the status of relation export
          Source-->>-Destination: Relation export status, loop if not ready
          Destination->>+Source: Download relation export
          Source-->>-Destination: Relation export data
          Destination->>Destination: Extract, transform, and load relation data
      end
    end
    Destination-->-Owner: Notify import has finished

Proposed offline transfer process

This is similarly simplified, but it demonstrates how the export and import processes are now split and no requests to either instance need to be made. If both the source and destination have access to the same object storage, the user does not need to manually move export data to another object storage location. However, offline transfer is not currently intended to parallelize the export and import process even when exporting and importing from the same location.

sequenceDiagram
    box Source Network
      participant Source
      participant ObjectStorageSource
    end
    actor Owner
    box Destination Network
      participant ObjectStorageDestination
      participant Destination
    end
    Note right of ObjectStorageSource: This is an external object storage location<br>and does not need to be provisioned<br>within the GitLab configuration
    Note right of ObjectStorageDestination: This is an external object storage location<br>and does not need to be provisioned<br>within the GitLab configuration

    Owner->>Source: Begin offline transfer export
    Source->>ObjectStorageSource: Validate access to object storage
    Source->>Source: Begin offline transfer export (async)
    Source-->>Owner: Notify offline transfer export began
    loop For each exported entity relation
        Source->>ObjectStorageSource: Write relation data to .ndjson file
    end
    Source-->>Owner: Notify export complete

    Owner->>ObjectStorageSource: Download exported files
    Owner->>ObjectStorageDestination: Upload exported files

    Owner->>Destination: Begin import from ObjectStorageDestination
    Destination->>ObjectStorageDestination: Get source instance metadata
    ObjectStorageDestination-->>Destination: Source metadata file
    Destination-->Destination: Begin import (async)
    Destination-->>Owner: Started
    loop For each entity relation (no more status checks)
        Destination->>ObjectStorageDestination: Fetch relation export file
        ObjectStorageDestination-->>Destination: Relation export data
        Destination->>Destination: Extract, transform, and load relation data
    end
    Destination-->>Owner: Notify import has finished

Compatible GitLab versions

Minimum destination version

The earliest GitLab version where offline transfer imports are supported.

Minimum source version

The earliest GitLab version where offline transfer exports are supported. Compatibility with earlier versions may be accomplished through the use of an export tool, such as Congregate or an external script, but will not be officially supported by the Import team.

While out of scope of this architecture, it’s possible that Congregate can be updated to act as an export orchestrator for earlier versions of GitLab, replacing the role of API calls from the destination instance. A potential flow using Congregate or other export tool to orchestrate offline exports would look something like this:

sequenceDiagram
    box Source Network
      participant Source
      participant ExportTool
    end
    actor Owner
    box Destination Network
      participant ObjectStorage
      participant Destination
    end
    Note right of ObjectStorage: This is an external object storage location<br>not object storage provisioned<br>within the GitLab configuration
    Owner->>ExportTool: Begin exporting list of relations to disk
    ExportTool->>Source: Get source instance details for validation
    Source-->>ExportTool: Source version, relation structure, etc.
    ExportTool->>ExportTool: Write metadata file
    loop For each exported entity
        ExportTool-)Source: Begin exporting entity relations (async)
        loop For each entity relation
          ExportTool->>Source: Get the status of relation export
          Source-->>ExportTool: Relation export status, loop if not ready
          ExportTool->>Source: Download relation export
          Source-->>ExportTool: Relation export data
          ExportTool->>ExportTool: Write relation data to .ndjson file to disk/object storage
        end
    end
    ExportTool->>Owner: Notify export complete

    Owner->>ObjectStorage: Upload exported files

    Owner->>Destination: Begin an import from ObjectStorage
    Destination->>ObjectStorage: Get source instance metadata
    ObjectStorage-->>Destination: Source metadata file
    Destination-->Destination: Begin import (async)
    Destination-->>Owner: Started
    loop For each entity relation (no more status checks)
        GDestination->>ObjectStorage: Fetch relation export file
        ObjectStorage-->>Destination: Relation export data
        Destination->>Destination: Extract, transform, and load relation data
    end
    Destination-->Owner: Notify import has finished

Design and implementation details

New Import Architecture

Offline API design

Once data has been exported from the source instance, the user will be able to input the credentials of the object storage instance that contains their exported data.

  • Object storage providers must be one that’s currently supported in GitLab’s configuration because GitLab already has the gems necessary to interface with those. In future iterations, additional Fog providers may be added if needed.

  • Create a new file download service for bulk imports similar to BulkImports::FileDownloadService that downloads from S3 configuration. Validations such as file size, type, etc. on remote file can be done in this service. These services abstract the work to fetch relation files away from the pipelines themselves.

  • When the user begins an offline import on the destination, they query a new API endpoint to begin an offline export with the following params:

    # These params may change depending on object storage implementation
    requires :configuration, type: Hash, desc: 'Object storage configuration' do
      requires :access_key_id, type: String, desc: 'Object storage access key ID'
      requires :secret_access_key, type: String, desc: 'Object storage secret access key'
      requires :bucket_name, type: String, desc: 'Object storage bucket name where all files are stored'
    end
    requires :entities, type: Array, desc: 'List of entities to import' do
      requires :source_type,
      type: String,
      desc: 'Source entity type',
      values: %w[group_entity project_entity]
      requires :source_full_path,
      type: String,
      desc: 'Relative path of the source entity to import'
      requires :destination_namespace,
      type: String,
      desc: 'Destination namespace for the entity'
      optional :destination_slug,
      type: String,
      desc: 'Destination slug for the entity'
      optional :migrate_projects,
      type: Boolean,
      default: true,
      desc: 'Indicates group migration should include nested projects'
      optional :migrate_memberships,
      type: Boolean,
      default: true,
      desc: 'The option to migrate memberships or not'
    end
    

    The main difference between this new API is this endpoint accepts params for an object storage storage bucket instead of source instance configuration. It may also call a new service to handle creating the BulkImport record for offline imports if it’s substantially different from BulkImports::CreateService.

  • BulkImports::Configuration is updated to store credentials for the object storage bucket and store a hash of mappings to tie source_full_path to object_storage_file_prefix. These mappings are stored in a metadata file

Import metadata file structure

WIP: Do not rely on this specification for development until it’s finalized

Offline transfers will need a metadata file to map entity source paths to file keys in object storage. Since object storage is always a flat structure, and disk storage is always a nested structure, opting for a flat object storage with information on how to link entities seems best.

The metadata file holds the following information:

  • instance_version: Version of the source instance.
  • instance_enterprise: Whether or not the source instance was enterprise edition.
  • export_tag: Prefix for all files included in the current export. This allows multiple exports of the same entities to exist in object storage at the same time.
  • source_hostname: Source hostname specified by the user.
  • batched: Whether or not relations were exported in batches. In the current architecture, either all relations are exported in batches or none is exported. There is no way to batch some relations but not others.
  • entities_mapping: Hash of entity full paths as keys to its object storage entity prefix. This shortens full paths into a short key to avoid excessively long object storage keys. It has no effect on the order that entities are imported or on the hierarchy of the entities. Entity prefixes are the entity type and its ID (for example, project_52 or group_28).

Example metadata file:

{
  "instance_version":"17.0.0",
  "instance_enterprise":true,
  "export_tag":"export_2025-09-18_1hrwkrv",
  "source_hostname":"https://offline-environment-gitlab.example.com",
  "batched":true,
  "entities_mapping":
  {
    "top_level_group":"group_1",
    "top_level_group/group":"group_2",
    "top_level_group/group/first_project":"project_1",
    "top_level_group/group/second_project":"project_2",
    "top_level_group/another_group":"group_3"
  }
}

File keys in object storage follow the format #{export_tag}/#{entity_prefix}/#{relation_name}.ndjson. Relation names are defined in each group/import_export.yml and project/import_export.yml. Archive file relations and batched relations include their .tar.gz files and batch files in a relation folder. For example:

  • group_1-self.json - self relations are exported as JSON files
  • group_1-milestones.ndjson - an example of a non-batched tree relation export
  • project_1-issues/batch_1.ndjson - an example of a batch of s tree relation export
  • project_1-repository.tar.gz - an example of a non-batched, non-tree relation export
  • project_1-uploads/batch_1.tar.gz - an example of a batched, non-tree relation export

Importing user contributions in offline transfer vs direct transfer

The user_contributions relation export file was implemented in direct transfer, but was never fully used in the import process in direct transfer. This file will be necessary in offline transfer because no API calls can be made to the source instance to fetch missing source user attributes. This means fetching user contribution data will be notably different between direct transfer and offline transfer:

  • The user_contributions relation can be skipped entirely for direct transfer exports.
  • A new pipeline for user_contributions will need to be implemented for offline transfer and must only run in offline transfer imports.
  • SourceUsersAttributesWorker does not need to be enqueued at all for offline exports unless source users are created before importing from user_contributions.ndjson. In that case, the worker must run a new service that references user_contributions.ndjson instead of making API calls to the source.

Export Architecture

Current export limitations

In direct transfer, the destination instance periodically queries the status of exporting relations. There’s no export-side process to determine when relation exports are done and ready to be transferred to an object storage location accessible to the destination instance. New models will need to be made within GitLab so that export progress can be tracked, the user can be notified when the export is complete, and export errors can be better tracked.

New offline transfer models and relationships

The only new model is Import::Offline::Export in order to connect all related BulkImports::Export records and determine an overall export status. Import::Offline::Export would also serve as a source of truth for compiling export metadata to be written to metadata.json.

BulkImports::Configuration attributes needed for offline transfer imports will be reused for Import::Offline::Export to store object storage export credentials on the source. BulkImports::Export and BulkImports::ExportBatch will continue to be used for offline exports, but BulkImports::ExportUpload records will not be created. Instead, compressed export files will be uploaded directly to the object storage location configured in BulkImports::Configuration.

---
title: Offline transfer export model relationships
config:
  class:
    hideEmptyMembersBox: true
---
classDiagram
    class OfflineExport["Import::Offline::Export"] {
      +Enum status(created, started, finished, timed out, failed, canceled)
      +Boolean batched
      +Boolean has_failures
      +update_has_failures
    }

    class BulkImportsExport["BulkImports::Export"] {
      +Enum status
    }

    class BulkImportsConfiguration["BulkImports::Configuration"] {
      +Boolean offline
      +String export_prefix
      +String bucket
      +String source_hostname
      +Hash offline_entities_mapping
      +AttrEncrypted<Hash> object_storage_credentials
    }

    class BulkImportsExportUpload["BulkImports::ExportUpload"]

    class BulkImportsExportBatch["BulkImports::ExportBatch"]

    %% Relationships
    OfflineExport "*" -- "1" User
    OfflineExport "0..1" -- "*" BulkImportsExport
    OfflineExport "0..1" -- "1" BulkImportsConfiguration

    BulkImportsExport "1" -- "0..1" BulkImportsExportUpload
    BulkImportsExport "1" -- "*" BulkImportsExportBatch
    BulkImportsExportBatch "0..1" -- "0..1" BulkImportsExportUpload

Backend export process

The process for exporting relations will be nearly identical to the export process in direct transfer. However, a new worker on the source instance will orchestrate exporting entities instead of the destination instance:

  1. The user calls an API endpoint with object storage credentials, configuration, and a list of entities to export.
  2. The API endpoint executes a new service, Import::Offline::Export::CreateService, to create Import::Offline::Export.
  3. Import::Offline::Export::CreateService validates the configuration, the user’s permissions for the entities, and access to the object storage provider. If any are invalid, an error is returned to the user.
  4. When validations pass, Import::Offline::Export::CreateService creates a new Import::Offline::Export. It begins the export using a new async worker, Import::Offline::ExportWorker, and returns a success message to the user. This new worker will function similarly to BulkImportWorker.
  5. Import::Offline::ExportWorker executes a new service, Import::Offline::Export::ProcessService, which executes BulkImports::ExportService for each entity provided. This worker re-enqueues itself on a delay while exports are still processing.
  6. BulkImports::Export is created and exported like in direct transfer. However, export files are written to the configured object storage instead of creating BulkImports::ExportUpload objects.
  7. When the exports are complete, Import::Offline::Export::ProcessService writes metadata.json based on Import::Offline::Export attributes, its BulkImports::Export, instance metadata, and BulkImport::Configuration. A new service might be the best approach for writing metadata.json files.
  8. Import::Offline::Export::ProcessService sets the status of Import::Offline::Export to complete and sends a notification email to the exporting user.

The offline export process supports canceling and stopping exports that have unexpectedly failed or timed out. Direct transfer has established patterns for these actions, which should be followed for consistency when possible:

  • When a user cancels an offline transfer export, no new relation exports should begin.
  • Similar to BulkImport records, Import::Offline::Export records that have not been updated in over 24 hours are considered stale and are cleaned up. It might be possible to reuse the existing BulkImports::StaleImportWorker to clean up these records.
  • A failing BulkImport::Export should not automatically fail the entire Import::Offline::Export. Instead, has_failures should be set to true on the Import::Offline::Export record. Import::Offline::Export is only considered failed when all exported relations fail.

User Contribution Mapping

User contribution and membership mapping relies on a source hostname to map a real user to a placeholder user. In the current import process, a source hostname is the host of the import URL used to import groups and projects. Each importer must make repeated, successful API calls to the source host to complete the import. Offline transfer, however, never contacts the source instance directly, raising unique challenges for user contribution mapping that are not a concern in online transfers:

  • The destination instance never contacts the source instance directly, making it impossible to determine where the export data originated from with certainty. Additionally, the data can be maliciously manipulated.
  • One object storage location may have data from multiple hosts.
  • An incorrect or vague hostname does not allow reassigned users to be confident in what contributions they may be accepting, opening an opportunity for abuse.
  • Hostnames are required and cannot be blank for user contribution mapping. Users who do not want to expose the source hostname for security or privacy reasons must choose some alias for the hostname.

Proposed solution:

Given these challenges, exporting users must provide a source hostname when generating an offline transfer export. For convenience, it will be written to metadata.json so the importing user does not need to re-specify it. The export UI, when it comes time to implement it, must warn the user that this is the hostname users will see on the destination when accepting their placeholder user reassignments. Using a different hostname on subsequent imports will result in users having to accept placeholder reassignments a second time.

Additionally, the email to accept user contributions must be updated to display a warning that the source hostname was manually entered by a user and it should not be trusted. It should also explicitly tell the user that the import type was offline transfer. This way, a malicious user can’t send a reassignment request to a user from offline transfer where the source hostname was manually set to a trusted domain like https://github.com.

Potential user mapping enhancements:

  • Allow offline transfer migrations to set source_hostname as a plain string so that users who wish to not expose the source instance hostname can use any alias they wish.
  • Ideally, individual users should be able to limit who can send them requests for reassignment (for example, specific users or owners of a trusted group) or to opt out entirely.
  • For offline exports out of GitLab.com, set the hostname to https://gitlab.com and do not allow the user to edit it in the UI. A user can always edit the export files manually, but it could provide a bit of convenience and consistency for regular users.
  • Implement a mechanism to verify ownership of a source hostname and prevent placeholder user reassignment from non-verified hosts. This may be impossible to implement.

Iterations

First release steps:

  • Define and document metadata and export file structures.
  • Build components necessary to export groups and projects to object storage with API support to begin offline exports on the source.
  • Convert data that is currently imported into the destination using API calls directly to the source instance into file exports so they will be supported by offline transfer.
  • Begin work to update the import side according to file discovery and metadata definitions. Offline transfer imports can be developed in parallel with offline transfer exports, but caution must be taken to ensure consistency across relations.

After first release iterations:

  • Build a UI in GitLab to export groups offline for a better user experience.
  • Implement flexibility in export upload locations. In theory, it doesn’t matter where we fetch export files from, as long as it’s accessible to the destination instance and connection to the storage location is secure. Possible support could include:
    • Object storage providers other than AWS S3
    • Network storage (Self-Managed only)
    • Local disk storage (Self-Managed only)
Last modified October 14, 2025: Remove trailing spaces (3643eb9e)