Security Architecture

Overview

Security Architects are the trusted security advisors of GitLab Engineering. Security Architecture is a natural extension of the greater Architecture initiative at GitLab. It is the preliminary and necessary work to build software with security considerations.

Objectives

Security Architecture protects the organization from cyber harm, and support present and future business needs by:

The process is designed with these constraints in mind:

  • aligned with our values
  • asynchronous
  • self-service as much as possible
  • avoid being a bottleneck in the software development life cycle
  • deliberately simple and concise
  • automated as much as possible
  • DRY with strong notes

Scope of Security Architecture

Any change in our product offering (whether it is a feature, a service, or an acquisition), that would impact our security posture. Our security posture is defined by:

  • the components we build upon
  • the components we embed
  • everything infrastructure
  • 3rd party services
  • software architecture
  • reference architectures

Security Architecture Requirements

Application Security

The Application Security team provides guidelines and requirements to follow during all the life cycle of source code:

InfraSec

Compliance

Cryptography

Security Architecture Principles

The Security Architecture Principles are not requirements nor decisions, but something in between.

Our principles are based on two simple pillars:

  1. Least privilege
  2. Network isolation

They are detailed below with the principles taken from the book Software Systems Architecture (see references) and this ACCU 2019 related video. These are very close to the OWASP Security Design Principles but are easier to understand and apply.

Assign the least privilege possible

Why

Broad privileges allow malicious or accidental access to protected resources.

How

  • Give only the minimum level of access rights (privileges) that is necessary to a user or service to complete an assigned operation. This right must be given only for a minimum amount of time that is necessary to complete the operation.
  • Do not use administrative accounts for application access
  • Use separate accounts for sensitive data

Examples

  • Run service processes as their own users with exactly the set of privileges they require
  • Grant read-only permissions when no updates are required
  • When updates are required, limit to the scope to the target resource only

Separate responsibilities

Why

Limit the blast radius of successful attacks: When one part of the system is compromised, the whole system is not.

Make attacks less attractive.

How

  • Compartmentalize responsibilities and privileges
  • Separation of duties: the successful completion of a single task is dependent upon two or more conditions
  • Don’t store secrets along with other non-sensitive data (like settings), even if secrets are filtered out

Examples

  • A system/service that only needs to read git commits should not be able to access user data
  • GitLab team members don’t have access to billing data, nor anything else classified red data

Trust cautiously

Why

  • Many security problems caused by inserting malicious intermediaries in communication path

How

  • Assume unknown entities are untrusted
  • Have a clear process to establish trust
  • Validate who or what is connecting
  • Always use a kind of authentication (certificate, password, …)
  • Network controls
  • Do not dynamically load 3rd party code

Examples

  • Services can’t be considered as secure as soon as they are not exposed to the Internet. SSRF can let attackers freely access them.
  • The best way to authenticate users is to apply this general security principle: Provide something you know (ex: password), and something you own (ex: certificate). This is what we apply with MFA, for example by providing a password you know, along with a TOTP that is generated by an application.
  • Downloading 3rd party libraries or scripts at runtime can lead to many security issues, including cache poisoning, XSS, and whatnot. Without checking the integrity of the external asset, malicious actors can tamper the files, like this example of BGP Hijacking

Simplest solution possible

Why

  • Simple solutions are easier to deploy, maintain, and secure
  • Aligned with our Iteration and Efficiency values
  • Security requires understanding of the design
  • Complexity increases exponentially
  • Attack-ability or attack surface of the software is reduced

How

  • Avoid complex failure modes, implicit behaviours, unnecessary features
  • Use well-known, tested, and proven components
  • Avoid over-engineering and strive for MVCs instead

Examples

  • Introducing a new server in GitLab means updating Omnibus builds, Helm charts, our reference architectures, our docs, and so on. This is something to balance carefully against the benefits of adding a component which seem to be a perfect fit.

Audit sensitive events

Why

  • Provide record of activity
  • Deter wrong doing
  • Provide a log to construct that past
  • Provide a monitoring point

How

  • Record all security significant events in a tamper-resistant store
  • Provide notifications for all sensitive events

Examples

  • Enable GuardDuty in AWS or Cloud Audit Logs in GCP to record activity and detect malicious intent.
  • Leverage Panther (for gitlab.com only) to collect, normalize, and analyze logs.
  • Provide notifications to users when:
    • Changes to their accounts
    • New keys generated or added to their accounts
  • Generate security events (could be Slack notifications) for unusual activity:
    • Signal passing a threshold (rate limiting in action)
    • Component signature not matching
    • Unauthorized access to sensitive resources

Fail securely & use secure defaults

Why

  • Default passwords, ports and rules are “open doors”
  • Failure and restart states often default to “insecure”

How

  • Force changes to security sensitive parameters
  • Think through failures - to be secure but recoverable
  • Unless a subject is given explicit access to an object, it should be denied access to that object, aka Fail Safe Defaults.

Examples

  • Do not trust invalid/expired TLS certificates
  • Some components like Grafana come with a default admin/admin user/password.
  • Related to above, some components might fail over to a plain user/password authentication (with default credentials) under certain conditions, like a service not reachable.
  • Some frameworks tend to render error pages with details that should not be shared, like hostnames and paths, when they cannot connect to some resources.

Never rely upon obscurity

Why

  • Hiding things is difficult, someone is going to find them, accidentally or on purpose
  • We’re a very transparent company and are more likely to share implementation details, sometimes leaking something sensitive.
  • Offboarded employees leave with sensitive knowledge. While tokens can be rotated, we can’t ensure this knowledge won’t leak

How

  • Assume attacker with perfect knowledge

Examples

  • Recon can help attackers find servers that are not publicly documented. These servers could expose vulnerable components, and lead to east-west movement.
  • Changing the path to a admin section won’t prevent attackers from finding it eventually.

Implement defense in depth

Why

  • Systems do get attacked, breaches do happen, mistakes are made
  • Minimize blast radius: One component compromised should not compromise the whole system
  • Prevent SSRF

How

  • Don’t rely on a single point/layer of security:
    • Secure every level
    • Stop failures at one level propagating
  • Encrypt data at rest and in transit
  • Use vulnerability scanners
  • Close unnecessary ports and disable unused features

Examples

  • A resource is well protected when accessed via the UI, but could be more exposed via the API.
  • Accounts are locked when too many attempts, in order to avoid brute-force attacks.
  • OS execution can lead to bypass all application security layers, because the execution occurs outside of the application.
  • Unnecessary open ports and enabled features may lead to authentication bypass and other weaknesses. They increase the exposure of an application.

Never invent security technology

Why

  • Security technology is difficult to create, and avoiding vulnerabilities is difficult
  • It takes years to secure and mature new security technologies
  • They are expected to be perfect (sort of)

How

  • [Do not roll your own crypto]
  • Use well-known and proven components
  • In doubt, always involve the right SMEs

Examples

  • Do not implement SSO from scratch

Why

  • A system is just as secure as its weakest link
  • Over time, new vulnerabilities are discovered, and a component might suddenly become the new weak link

How

  • Threat model the system, repeat, iterate.
  • Identify central components that
    • share more privileges than the others
    • have more connections to other components
    • are entrypoints (login modules, APIs, …)
  • Run Dependency Scanning
  • Avoid weak ciphers and algorithms
  • Sometimes consider the humans (users) as the weakest link. Phishing is still widely used for a good reason

Examples

  • Some resources are very well protected in the UI, and never exposed to unauthorized users. Yet, if the API is not correctly implementing security controls, these resources could be passed as raw models without filtering sensitive data.
  • Data encrypted in transit but not at rest.
  • The weakest link could also be a user. Not enforcing strong passwords and MFA could lead to sensitive data exposure, but users can also do harmful actions without being aware of it.
  • OS (system) commands often leads to bypassing most, if not all, the security controls of an applicaton. It is a common vector for RCEs and should be avoided as much as possible.

Security Architecture reviews

As part of the Production Readiness Process, it is highly recommended to include a Security Architecture review.

The Security Architecture review process is detailed in this page.

Measuring results

Security Architecture, by nature, doesn’t generate measurable data, apart from the number of architecture diagrams and reviews. While this could be used as a metric, it’s only reflecting work load, and not achievements. Instead, we are measuring success in terms of maturity.

The OWASP SAMM framework is currently used, but this is subject to change (see discussions in this issue).

Communication channels

References


Security Architecture review process

Overview

Security Architecture review is a holistic assessment of security layers across infrastructure, application, people, and processes.

Purpose

When to conduct a Security Architecture review?

The review process is integrated into the broader Architecture workflow, but can be triggered for:

Zero Trust

Zero Trust

As part of raising that bar, GitLab is implementing Zero Trust, or the practice of shifting access control from the perimeter of the org to the individuals, the assets and the endpoints. You can learn more about this strategy from the Google BeyondCorp whitepaper: A New Approach to Enterprise Security.

In our case, Zero Trust means that all devices trying to access an endpoint or asset within our GitLab environment will need to authenticate and be authorized. Because Zero Trust relies on dynamic, risk-based decisions, this also means that users must be authorized and validated: what department are they in, what role do they have, how sensitive is the data and the host that they are trying to access? We’re at the beginning stages in our Zero Trust roadmap, but as we move along in the journey, we’ll document our lessons learned, process and progress in our Security blog.

Last modified November 14, 2024: Fix broken external links (ac0e3d5e)