Architecture

Complexity at Scale

As GitLab grows, through the introduction of new features and improvements on existing ones, so does its complexity. This effect is compounded by the care and feeding of a single codebase that supports the wide variety of environments in which it runs, from small self-managed instances to large installations such as GitLab.com. The company itself adds to this complexity from an organizational perspective: hundreds employees worldwide contribute in one way or another to both the product and the company, using GitLab.com on a daily basis to do their job. Teams members in Engineering are directly responsible for the codebase and its operation, for the infrastructure powering GitLab.com, and for the support of customers running self-managed instances. Likewise, team members in the Product organization chart the future of the product.

Our values, coupled with strong know-how and unparalleled dedication, provide critical guidance to manage these complexities. At the same time, we have known for some time we are crossing a threshold in which complexity at scale, both technical and organizational, is playing a significant role. We know we need to fine-tune both our technical discipline (so we can integrate it across the organization) and our organizational amplification (so we can span and leverage the entire organization) to ensure we can continue to deliver on our values. In this context, we have been exploring the adoption of Architecture or Engineering Practice to help us in this regard.

Architecture

Martin Fowler’s Software Architecture Guide provides an excellent discussion on the notion of Architecture, and there is much to be learned from this and other sources. The question before us is, then, how to contextualize those learnings and apply them at GitLab.

Much like the rest of the software world, we have been wary of all the negative baggage that Architecture implies, particularly as some of that baggage would seemingly fly in the face of our values. This is why we have taken the time to carefully consider what Architecture means for us, and how to implement it in alignment with our values and at the scale that both the product and the company demand.

At GitLab, Architecture is not a dedicated role (i.e., no such title exists in the company). We understand Architecture as a component of all technical roles, a set of practices to leverage the vast amount of experience distributed across the company, and a workflow to ensure we can continue to scale efficiently.

Architecture at GitLab

At GitLab, Architecture is a collaborative process. It is also:

  • A collection of practices that provide technical frameworks to guide (rather than dictate) our thinking, design, and discussions so we can iterate quickly and deliver results. These include the Scalability Practice. Others are in the works (such as the Availability Practice).
  • A collaborative workflow that provides the necessary organizational solution to foster inclusion, and drive ideas and priorities from all corners of the company.
  • A collection of design documents and roadmaps which are artifacts resulting from the Architecture Design Workflow.

Such definition implies a solid reliance on collaboration rather than authority to efficiently and transparently drive decisions, engage stakeholders, and promote trust across the organization

Artifacts: roadmaps and design documents

We strive for results and concrete outcomes, which in this case entail roadmaps and design documents. Roadmaps are documents that aggregate many Design documents.

Architecture as a practice is everyone’s responsibility

Architecture at GitLab is not a dedicated role but it is notably engrained in senior technical leadership roles, where the roles’ levels and their sphere of influence determine responsibilities within the practice.

Architecture Design Workflow

The Architecture Design Workflow is used to create design documents that are being used to align team members across multiple iterations.

Roadmap

Following our Transparency value, our architecture roadmap and design documents are public.


Architecture Design Documents

Design documents are the primary artifact that the architecture design workflow revolves around. A design document describes a technical vision and a set of principles that will guide feature implementation, as we move forward. It acts as guardrails to keep team aligned.

They are version controlled documents that are constantly updated with new insights and knowledge, after every iteration, to become even more useful with time.

Contributing

At GitLab, everyone can contribute, including to our design documents. If you would like to contribute to any of these documents, feel free to:

Architecture Design Workflow

As engineers at GitLab, we lead the evolution of software, constantly working to find the right balance between proactive work, reactive work, and innovation. We strive to determine what work is important and what work is not, leveraging knowledge from those that know the most about GitLab, and empowering people to work on things that make everyone more productive. Experimenting and innovating are core to how we work, and we focus on collaboration, results and iteration to achieve our goals.

Guidelines
Practices
Technology Roadmap

As GitLab continues to grow and mature, it is approaching a pivotal point in which faster growth across multiple large sites and an emphasis on the enterprise will become the main challenges to contend with over the next 12 months. All within the context of a relentless focus on security, availability, and performance.

In order to meet these challenges, we need to attain extremely high levels of predictability when operating the environments. This entails stricter standardization while providing the required flexibility to support both self-managed instances and GitLab sites, which enables the use of advanced automation by leveraging advanced tooling. But even with a high confidence level on predictability, risk is a factor we must manage: things will inevitably go wrong when effecting change in the environments. Thus, we must be able to quantify acceptable risk, to react fast in the face of failure to restore stability, and to so as a single Engineering unit.