GitLab Modular Monolith
This page contains information related to upcoming products, features, and functionality.
It is important to note that the information presented is for informational purposes only.
Please do not rely on this information for purchasing or planning purposes.
The development, release, and timing of any products, features, or functionality may be
subject to change or delay and remain at the sole discretion of GitLab Inc.
Summary
The main GitLab Rails
project has been implemented as a large monolithic application, using
Ruby on Rails framework. It has over 2.2 million
lines of Ruby code and hundreds of engineers contributing to it every day.
The application has been growing in complexity for more than a decade. The
monolithic architecture has served us well during this time, making it possible
to keep high development velocity and great engineering productivity.
Even though we strive for having an approachable open-core architecture
we need to strengthen the boundaries between domains to retain velocity and
increase development predictability.
As we grow as an engineering organization, we want to explore a slightly
different, but related, architectural paradigm:
a modular monolith design,
while still using a monolithic architecture
with satellite services.
This should allow us to increase engineering efficiency, reduce the cognitive
load, and eventually decouple internal components to the extend that will allow
us to deploy and run them separately if needed.
Motivation
Working with a large and tightly coupled monolithic application is challenging:
Engineering:
- Onboarding engineers takes time. It takes a while before engineers feel
productive due to the size of the context and the amount of coupling.
- We need to use
CODEOWNERS
file feature for several domains but
these rules are complex.
- It is difficult for engineers to build a mental map of the application due to its size.
Even apparently isolated changes can have far-reaching repercussions
on other parts of the monolith.
- Attrition/retention of engineering talent. It is fatiguing and demoralizing for
engineers to constantly deal with the obstacles to productivity.
Architecture:
- There is little structure inside the monolith. We have attempted to enforce
the creation of some modules
but have no company-wide strategy on what the functional parts of the
monolith should be, and how code should be organized.
- There is no isolation between existing modules. Ruby does not provide
out-of-the-box tools to effectively enforce boundaries. Everything lives
under the same memory space.
- We rarely build abstractions that can boost our efficiency.
- Moving stable parts of the application into separate services is impossible
due to high coupling.
- We are unable to deploy changes to specific domains separately and isolate
failures that are happening inside them.
Productivity:
- High median-time-to-production for complex changes.
- It can be overwhelming for the wider-community members to contribute.
- Reducing testing times requires diligent and persistent efforts.
Goals
- Increase the development velocity and predicability through separation of concerns.
- Improve code quality by reducing coupling and introducing useful abstractions.
- Build abstractions required to deploy and run GitLab components separately.
How do we get there?
While we do recognize that modularization is a significant technical endeavor,
we believe that the main challenge is organizational, rather than technical. We
not only need to design separation in a way that modules are decoupled in a
pragmatic way which works well on GitLab.com but also on self-managed
instances, but we need to align modularization with the way in which we want to
work at GitLab.
There are many aspects and details required to make modularization of our
monolith successful. We will work on the aspects listed below, refine them, and
add more important details as we move forward towards the goal:
- Deliver modularization proof-of-concepts that will deliver key insights.
- Align modularization plans to the product structure by defining bounded contexts.
- Separate domains into modules that will reflect product structure.
- Start a training program for team members on how to work with decoupled domains (TODO)
- Build tools that will make it easier to build decoupled domains through inversion of control (TODO)
- Introduce hexagonal architecture within the monolith
- Introduce clean architecture with one-way-dependencies and host application (TODO)
- Build abstractions that will make it possible to run and deploy domains separately (TODO)
Status
In progress.
- A working group Bounded Contexts
was concluded in April 2024 which defined a list of bounded contexts to be enforced for GitLab Rails domain and
infrastructure layer.
Decisions
- ADR-001: Modularize application domain? Start with modularizing
the application domain and infrastructure code.
- ADR-002: Define bounded context around feature categories as a SSoT in the code.
- ADR-003: Assign stewards to all modules and libraries.
Glossary
modules
are Ruby modules and can be used to nest code hierarchically.
namespaces
are unique hierarchies of Ruby constants. For example, Ci::
but also Ci::JobArtifacts::
or Ci::Pipeline::Chain::
.
packages
are Packwerk packages to group together related functionalities. These packages can be big or small depending on the design and architecture. Inside a package all constants (classes and modules) have the same namespace. For example:
- In a package
ci
, all the classes would be nested under Ci::
namespace. There can be also nested namespaces like Ci::PipelineProcessing::
.
- In a package
ci-pipeline_creation
all classes are nested under Ci::PipelineCreation
, like Ci::PipelineCreation::Chain::Command
.
- In a package
ci
a class named MergeRequests::UpdateHeadPipelineService
would not be allowed because it would not match the package’s namespace.
- This can be enforced easily with Packwerk’s based RuboCop Cops.
bounded context
is a top-level Packwerk package that represents a macro aspect of the domain. For example: Ci::
, MergeRequests::
, Packages::
, etc.
- A bounded context is represented by a single Ruby module/namespace. For example,
Ci::
and not Ci::JobArtifacts::
.
- A bounded context can be made of 1 or multiple Packwerk packages. Nested packages would be recommended if the domain is quite complex and we want to enforce privacy among all the implementation details. For example:
Ci::PipelineProcessing::
and Ci::PipelineCreation::
could be separate packages of the same bounded context and expose their public API while keeping implementation details private.
- A new bounded context like
RemoteDevelopment::
can be represented a single package while large and complex bounded contexts like Ci::
would need to be organized into smaller/nested packages.
References
List of references
The general steps of refactoring existing code to modularization could be:
-
Use the same namespace for all classes and modules related to the same bounded context.
- Why? Without even a rough understanding of the domains at play in the codebase it is difficult to draw a plan.
Having well namespaced code that everyone else can follow is also the pre-requisite for modularization.
- If a domain is already well namespaced and no similar or related namespaces exist, we can move directly to the
next step.
-
Prepare Rails development for Packwerk packages. This is a once off step with maybe some improvements
added over time.
Historical context
Until May 2024 the GitLab codebase didn’t have a clear domain structure.
We have forced the creation of some modules
as a first step but we didn’t have a well defined strategy for doing it consistently.
The majority of the code was not properly namespaced and organized:
- Ruby namespaces used didn’t always represent the SSoT. We had overlapping concepts spread across multiple
namespaces. For example:
Abuse::
and Spam::
or Security::Orchestration::
and Security::SecurityOrchestration
.
- Domain code related to the same bounded context was scattered across multiple directories.
- Domain code was present in
lib/
directory under namespaces that differed from the same domain under app/
.
- Some namespaces were very shallow, containing a few classes while other namespaces were very deep and large.
- A lot of the old code was not namespaced, making it difficult to understand the context where it was used.
In May 2024 we defined and enforced bounded contexts.
Background
This design document supersedes the previous Composable GitLab Codebase
where we explored the idea of separating the codebase into technical runtime profiles:
for example, run the monolith solely as a Sidekiq node.
With a modular monolith and the use of an Hexagonal Architecture, we can achieve both
separation of domains as well as separation of application adapters, which may include the usage of engines and/or different runtime profiles.
Context
Before we modularize a codebase we first needed to define how we are going to divide it.
Decision
We start by focusing on the application domain (backend business logic) leaving the
application adapters (Web controllers and views, REST/GraphQL endpoints) outside the
scope of the modularization initially.
The reasons for this are:
- Code in application adapters may not always align with a specific
domain. For example: a project settings endpoint or a merge request page contain
references to many domains.
- There was a need to run separate Rails nodes for the SaaS architecture using different
profiles in order to save on memory.
For example: on SaaS we wanted to be able to spin up more Sidekiq nodes without the need
to load the whole Rails application. The assumption is that for running Sidekiq we don’t
need ActionCable, REST endpoints, GraphQL mutations or Rails views.
We only need the application domain and infrastructure code.
This could still be true even with the introduction of Cells but
we need to re-evaluate this assumption.
- Keep the scope and effort smaller. Tackling only domain code is easier to understand than
the complexity of how to breakdown the application adapters and all their edge cases.
The decision to scope out application adapters is not final and we decided to defer
it to later.
Context
With the focus primarily on the application domain we needed to define how to
modularize it.
Decision
The application domain is divided into bounded contexts which define the top-level
modules of GitLab application. The term bounded context is widely used in
Domain-Driven Design.
Defining bounded contexts means to organize the code around product structure rather than
organizational structure.
From the research in Proposal: split GitLab monolith into components
it seems that following product categories, as a guideline,
would be much better than translating organization structure into folder structure (for example, app/modules/verify/pipeline-execution/...
).
Context
How do we assign stewardship to domain and platform modules? We have a large amount of shared code
that does not have explicit stewards who can provide a vision and direction on that part of code.
Decision
We use the term stewards instead of owners to be more in line with GitLab principle of
everyone can contribute. Stewards are care takers of the code. They know how a specific
functionality is designed and why. They know the architectural characteristics and constraints.
However, they welcome changes and guide contributors towards success.
Modularization of our monolith is a complex project. There will be many
unknowns. One thing that can help us mitigate the risks and deliver key
insights are Proof-of-Concepts that we could deliver early on, to better
understand what will need to be done.
Inter-module communicaton
A PoC that we plan to deliver is a PoC of inter-module communication. We do
recognize the need to separate modules, but still allow them to communicate
together using a well defined interface. Modules can communicate through a
facade classes (like libraries usually do), or through eventing system. Both
ways are important.
Internal Slack Channels
Reference Implementations / Guides
Gusto / RubyAtScale: