GitLab CI for GitHub SCM

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed mfanGitLab allison.browne rutshah carolinesimpson devops verify 2024-11-26

Summary

Some customers using GitHub Source Code Management want to integrate with GitLab’s CI solutions to run pipelines and leverage our security features. They do not want to mirror the source code on GitLab and want instantaneous pipelines for their pushes.

Motivation

Our current approach with GitHub <-> GitLab integration is with mirroring at a minimum 5 minute interval. This is too slow for feedback, uses long lived personal access tokens, and requires a copy of GitHub’s source code on GitLab. Additionally, as the entire project and all pipelines are run under the user that initiates the integration; this could lead to a permission mismatch.

With an effective solution, we can sell GitLab CI/CD and other Ops features to businesses that use GitHub as a source control management tool.

CI tools in the market currently use GitHub source to run CI/CD externally. Buildkite, CircleCi, TeamCity, Jenkins are examples where the runner pulls directly from GitHub. In such examples, the pipeline config file can live either on GitHub or in the services.

Goals

As an initial MVC we want to support

  1. A GitHub App (on/off GitHub’s marketplace - this doesn’t need to be in MVP, users can install with a direct link)
  2. Near instant pipeline creation upon GitHub pushes
  3. We will only be communicating with GitHub via short lived tokens
  4. Correct user management system, through direct user mapping
    1. Each user on GitHub’s side should have a billable seat on GitLab
    2. GitLab users should have the least privilege needed to run pipelines
  5. Runners are the only place to interact (fetch/pull) with the source code
    1. Customer source code is stored in a GitHub Repo and is never stored in a GitLab repo

Non-Goals

Items that are out of scope (For MVP) include in order of importance

  1. GitLab Self-Managed
    1. Why: This should be quite do-able. Assuming we have docs for SM users to set-up. We should get this from the MVP
  2. GitHub Enterprise
    1. Why: This will require extra set-up, licensing, testing, and manual configurations as GitHubApp and Actions are different on the Enterprise level
  3. GitLab Security Scans, Pipeline execution policies, Merge Request widgets from artifacts (Junit/Cobertura)
    1. Why: Because the source code is in GitHub and our scans only work natively there’ll be extra work to fetch and analyze the code
    2. Similary for the MR widget, since the Merge Request is on GitHub side we can’t modify it
  4. Different actions on GitHub side that might want/require CI such as
    1. Forks
    2. Tags
    3. Branch name changes, etc
    4. Why: These are bit more complex as they rely on different webhooks as well as sha configurations

These are do-able, but just to reduce scope and complexity we can iterate on additional features.

Proposal

Architecture

GitHub will communicate with GitLab via our GitHubApp via webhooks. GitLab will generate a user access token via the GitHub App for the user that triggered the webhook, and create a pipeline for that user. When runners poll GitLab’s api, GitLab will provide the GitHub repo location and access token for the runner to fetch the code from. GitLab will then use GitHub’s API to update the commit with the pipeline status.

Design and implementation details

Prerequisites

The process begins with a GitHub organization owner creating the projects in GitLab that would be used as CI/CD.

We would reuse CI/CD For External Repositories, and have a “checkbox” to not mirror the repository (UI pending). During the import we’ll save the GitHub repository id. Issue for discussion

After import, we would automatically enable a new GitLab integration called GitHub SCM for these projects.

Then the customer will use a direct link to install our GitHub App, and choose which repos on GitHub to install it on. Upon GitHub app installation, and for OAuth, each user will need to accept the terms and conditions listed in GitHubApp Installation. The App’s permissions will be set-up by GitLab and hosted on GitHub. This will be set up with the basic permission of (read_repository, and write_commit_status). We can modify these permissions and add new ones anytime via Modifying GitHub App, although this will require users to re-accept the conditions and re-authenticate.

Each user that would want to trigger pipeline would need to OAuth with GitHub. Either via OAuth login or connecting their GitLab account with GitHub

Once everything is set-up properly

The steps here will be in accordance with the diagram above

  1. User on GitHub initiates a push to a branch
  2. GitHub will automatically trigger a webhook via our GitHubApp. This’ll send a push payload to GitLab
    1. GitLab Webhook Push Payload
  3. GitLab will use the signed payload’s sender_id and map that to a GitLab user.
    1. If the GitLab user does not have permissions to run pipelines. A pipeline will be created but will fail immediately. Anyone with correct permissions can re-try this pipeline.
  4. If the GitLab user does have permissions to create pipelines.
    1. GitLab will use the refresh_token of the user to generate a new access_token, so we can act on behalf of the user
    2. This access_token is repository specific, and will only have
      1. Read access to repos
      2. Write access to commit_status (to update the commit with pipeline details)
    3. There’s no way to revoke the access_token without also revoking the refresh_token. But if want to be safe, we can just re-generate a new access_token after the pipeline has finished.
    4. The pipeline will be generated by pulling a .gitlab-ci.yml file present on GitHub side.
      1. To make this easier in the future to work with includes; we’ll require this .yml file to be in a gitlab folder
  5. GitLab calls CreatePipelineService sidekiq job and sets up the pipeline.
  6. Rails will pass the necessary params including the access_token to the runner. It will also update the pipeline on GitHub side to “running” via GitHub API.
  7. The runner will directly call GitHub with the access_token to fetch and pull the repository for that branch
  8. When the pipeline finishes, runner will update GitLab as normal
  9. GitLab will use the installation access token to post a commit on GitHub with the pipeline’s sha. Notifying them the commit has a finished pipeline.
    1. GitHub commit status API
    2. This will be limited to the 5k/15k rate limits per hour. But this should be sufficient for our customers.

Step 3 expanded:

There are many ways this authorization could play out. More details in Corresponding Implementation Issue.

We will enforce a 1:1 user mapping on GitHub <-> GitLab. Currently this is product’s preferred approach. This allows us to make sure the product is properly licensed with the correct number of seats.

Each user will need to manually link their GitHub accounts to their GitLab profiles. Which can only be done once GitHub is used as an authentication provider.

The downside is that users or bot accounts that do not have a GitLab account mapping will not be able to run pipelines. The workaround is to create a failed pipeline that maintainers can manually run.

Step 4.4 - As the gitlab-ci.yml file is hosted externally on GitHub.com. Only this file will then be pulled in when a webhook is triggered. This’ll keep the source code and config file in sync.

There are certain yaml definitions which would not work with this approach. The includes: yaml definition would not work, similarly cross-pipeline configs, security scans, pipeline test reports, execution policies would need additional patching to work.

Alternative Solutions

For Step 3.

Another idea is to use service accounts. Have the customer create a service account they want to use and assign it to a project with the permission set or custom role. Then we’ll have a UI allowing them to select which service account they want to use for these GitHub webhook actions.

This’ll need composite identities, as cross-project pipelines can exist. Additionally, there’ll be extra considerations for billing as GitHub users can share GitLab seats

For Step 4.

Another option is for the GitHub App to use Installation Access Tokens.

This would be a great option if not for the rate limits. As IAT have a limit of 5k/hour calls for GitHub.com and 15k/hour for GitHub Enterprise, this is not enough for large customers without a caching mechanism in place. This limit is on the organization level and shared between all their repositories.

Authentication as a GitHub App installation

Questions for Reviewers

  1. Compliance Question: Do we need additional terms and conditions depending on how we run & keep GitHub code?
    1. As the job logs will be kept on GitLab side
    2. Answer: (Rutshah) should be okay to keep job logs
  2. Compliance Question/PM: Some customers don’t want anything else aside from runner interacting with the source code. This requirement is unclear; does referencing, have snipplets of file/artifact names in the job logs invalidate this requirement?
    1. Partial Answer: (Rutshah) These passby references should be okay
Last modified April 28, 2025: Cleanup and reorg shortcodes (eef3c341)