Reusable Rapid Diffs (RRD)

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed patrickbajao igor.drozdov jerasmus iamphill slashmanov psjakubowska thomasrandolph ntepluhina devops create 2023-10-10

Summary

Diffs at GitLab are spread across several places with each area using their own method. We are aiming to develop a single, performant way for diffs to be rendered across the application. Our aim here is to improve all areas of diff rendering, from the backend creation of diffs to the frontend rendering the diffs.

All the diffs features related to this document are listed on a dedicated page.

Work breakdown

Rapid Diffs work is split into 3 stages and can be tracked in the following epics:

  1. Stage 0 — foundation:
    • Have foundational components in place.
    • Stream diffs on MR, commit and compare revisions pages.
  2. Stage 1 — baseline features:
    • Most of the features are working (dicussions, navigation, review, etc.)
  3. Stage 2 — production ready:
    • Feature specs pass against Rapid Diffs
    • Full accessibility compliance

Motivation

Goals

  • improved perceived performance
  • improved maintainability
  • consistent coverage of all scenarios

Non-Goals

This effort will not:

  • Identify improvements for the current implementation of diffs both in Merge Requests or in the Repository Commits

Priority of Goals

In an effort to provide guidance on which goals are more important than others to assist in making consistent choices, despite all goals being important, we defined the following order.

Perceived performance is above improved maintainability is above consistent coverage.

Examples:

  • a proposal improves maintainability at the cost of perceived performance: ❌ we should consider an alternative.
  • a proposal removes a feature from certain contexts, hurting coverage, and has no impact on perceived performance or maintainability: ❌ we should re-consider.
  • a proposal improves perceived performance but removes features from certain contexts of usage: ✅ it’s valid and should be discussed with Product/UX.
  • a proposal guarantees consistent coverage and has no impact on perceived performance or maintainability: ✅ it’s valid.

In essence, we’ll strive to meet every goal at each decision but prioritise the higher ones.

Process

Workspace & Artifacts

  • We will store implementation details like metrics, budgets, and development & architectural patterns here in the docs
  • We will store large bodies of research, the results of audits, etc. in the wiki of the RRD project
  • We will store audio & video recordings on the public YouTube channel in the Code Review / RRD playlist
  • We will store drafts, meeting notes, and other temporary documents in public Google docs

Proposal

The new approach proposed here changes what we have done in the past by doing the following:

  1. Stop using virtualized scrolling for rendering diffs.
  2. Move most of the rendering work to the server.
  3. Enhance server-rendered HTML on the client.
  4. Unify diffs codebase across all pages rendering diffs (merge request, repository commits, compare revisions and any other).

Definitions

Maintainability

Maintainable projects are simple projects.

Simplicity is the opposite of complexity. This uses a definition of simple and complex described by Rich Hickey in “Simple Made Easy” (Strange Loop, 2011).

  • Maintainable code is simple (single task, single concept, separate from other things).
  • Maintainable projects expand on simple code by having simple structure (folders define classes of behaviors, e.g. you can be assured that a component directory will never initiate a network call, because that would be conflating visual display with data access)
  • Maintainable applications flow out of simple organization and simple code. The old saying is a cluttered desk is representative of a cluttered mind. Rigorous discipline on simplicity will be represented in our output (the product). By being strict about working simply, we will naturally produce applications where our users can more easily reason about their behavior.

Done

GitLab has an existing definition of done which is geared primarily toward identifying when an MR is ready to be merged.

In addition to the items in the GitLab definition of done, work on RRD should also adhere to the following requirements:

  • Meets or exceeds all metrics
    • Meets or exceeds our minimum accessibility metrics (these are explicitly not part of our defined priorities, because they are non-negotiable)
  • All work is fully documented for engineers (user documentation is a requirement of the standard definition of done)

Acceptance Criteria

To measure our success, we need to set meaningful metrics. These metrics should meaningfully and positively impact the end user.

  1. Meets or exceeds WCAG 2.2 AA.
  2. Meets or exceeds ATAG 2.0 AA.
  3. The RRD app loads less than or equal to 300 KiB of JavaScript (compressed / “across-the-wire”)1.
  4. The RRD app loads less than or equal to 150 KiB of markup, images, styles, fonts, etc. (compressed / “across-the-wire”)1.
  5. The Time to First Diff (mr-diffs-mark-first-diff-file-shown) happens before 3 seconds mark.
  6. The RRD app can execute in total isolation from the rest of the GitLab product:
    1. “Execute” means the app can load, display data, and allows user interaction (“read-only”).
    2. If a part of the application is only used in merge requests or diffs, it is considered part of the Diffs application.
    3. If a part of the application must be brought in from the rest of the product, it is not considered part of the Diffs load (as defined in metrics 3 and 4).
    4. If a part of the application must be brought in from the rest of the product, it may not block functionality of the Diffs application.
    5. If a part of the application must be brought in from the rest of the product, it must be loaded asynchronously.
    6. If a part of the application meets 5.1-5.5 (such as: the Markdown editor is loaded asynchronously when the user would like to leave a comment on a diff) and its inclusion causes a budget overflow:
      • It must be added to a list of documented exceptions that we accept are out of bounds and out of our control.
      • The exceptions list should be addressed on a regular basis to determine the ongoing value of overflowing our budget.

1: The Performance Inequality Gap, 2023

Frontend

Ideally, we would meet our definition of done and our accountability metrics on our first try. We also need to continue to stay within those boundaries as we move forward. To ensure this, we need to design an application architecture that:

  1. Is:
    1. Scalable.
    2. Malleable.
    3. Flexible.
  2. Considers itself a mission-critical part of the overall GitLab product.
  3. Treats itself as a complex, unique application with concerns that cannot be addressed as side effects of other parts of the product.
  4. Can handle data access/format changes without making UI changes.
  5. Can handle UI changes without making data access/format changes.
  6. Provides a hookable, inspectable API and avoids code coupling.
  7. Separates:
    • State and application data.
    • Application behavior and UI.
    • Data access and network access.

Design and implementation details

Overview

Reusable Rapid Diffs introduce a change in responsibilities for both frontend and backend.

The backend will:

  1. Prepare diffs data.
  2. Highlight diff lines.
  3. Render diffs as HTML and stream them to the browser.
  4. Embed diffs metadata into the final response.

The frontend will:

  1. Enhance existing and future diffs HTML.
  2. Handle streamed diffs HTML.
  3. Enhance diffs HTML with dynamic controls to enable user interaction.

Static and dynamic separation

To achieve the separation of concerns, we should distinguish between static and dynamic UI on the page:

  • Everything that is static should always be rendered on the server.
  • Everything dynamic should be enhanced on the client.

Data that should be coming with the page:

  • Static diff file metadata: viewer type, added and removed lines, etc.
  • Edit permissions

Data that should be served through additional requests:

  • Discussions
  • File browser tree
  • Line expansion HTML
  • Full file HTML
  • Code quality
  • Code coverage
  • Everything else

We should return HTML for line expansion and view full file features. Other requests should return normalized data in JSON format.

Code suggestion feature should use the existing HTML of the diff, similar to the current implementation.

Performance optimizations

To improve the perceived performance of the page we should implement the following techniques:

  1. Limit the number of diffs rendered on the page at first.
  2. Use HTML streaming to render the rest of the diffs.
    1. Use Web Components to hook into diff files appearing on the page.
  3. Apply content-visibility whenever possible to reduce redraw overhead.
  4. Render diff discussions asynchronously.

Page & Data Flows

These diagrams document the flows necessary to display diffs and to allow user interactions and user-submitted data to be gathered and stored. In other words: this page documents the bi-directional data flow for a complete, interactive application that allows diffs to display and users to collaborate on diffs.

Critical Phases
  1. Gitaly
  2. Database
  3. Diff Storage
  4. Cache
  5. Back end
  6. Web API
  7. Front end*
flowchart LR
    Gitaly
    DB[Database]
    Cache
    DS[Diff Storage]
    FE[Front End]
    Display

    Gitaly <--> BE
    DB <--> BE
    Cache <--> BE
    DS <--> BE
    BE <--> API
    API <--> FE
    FE --> Display

    subgraph Rails
    direction LR
        BE[Back End]
        API[Web API]
    end

*: Front end obscures many unexplored phases. It is likely that the front end will need caches, databases, API abstractions (over sub-modules like network connectivity, etc.), and more. While these have not been expanded on, “Front end” stands in for all of that complexity here.

Gitaly

For fetching Diffs, Gitaly provides two basic utilities:

  1. Retrieve a list of modified files with associated pre- and post-image blob IDs for a set of revisions.
  2. Retrieve a set of Git diffs for an arbitrary set of specified files using pre- and post-image blob IDs.
sequenceDiagram
    Back end ->> Gitaly: "What files were modified between<br />this pair of/in this single revision?"
    Gitaly ->> Back end: List of paths
    Back end ->> Gitaly: "What are the diffs for this set of paths<br /> between this pair of/in this single revision?"
    Gitaly ->> Back end: List of diffs
Database
sequenceDiagram
    Back end ->> Database: What are the file paths for a known MR version?
    Database ->> Back end: List of paths
Cache
  • Fresh render of a diff
sequenceDiagram
    Back end ->> Cache: Give me the diff template for scenario XYZ
    Cache ->> Back end: Static template to render diff in scenario XYZ
  • Repeated render of a diff
sequenceDiagram
    Back end ->> Cache: Give me the compiled UI for diff ABC123
    alt Cache miss
        Cache ->> Back end: ☹️
        Back end ->> Cache: Cache the compiled UI for diff ABC123
    else
        Cache ->> Back end: Existing compiled diff UI
    end
Diff Storage
sequenceDiagram
    Back end ->> Diff Storage: Give me the raw diff of this file
    Diff Storage ->> Back end: Raw diff
Backend
  • First files rendered on page load
sequenceDiagram
    participant Client
    participant Back end
    participant Authorization
    participant HAML
    participant Cache
    participant Database
    participant Diff storage
    participant Gitaly

    Client ->> Back end: Page load request
    Back end ->> Authorization: Check is good request
    alt Unauthorized
        Authorization ->> Back end: No!
        Back end ->> Client: 403 or 404
    else
        Authorization ->> Back end: Authorized.
        alt MR Diff
            Back end ->> Database: Get N files
            Database ->> Back end: Files
            Back end ->> Diff storage: Get diffs of N files
            Diff storage ->> Back end: Diffs
        else
            Back end ->> Gitaly: Get diffs of N files
            Gitaly ->> Back end: Diffs
        end
        loop Iterate through each diff file
            Back end ->> HAML: Render diff file
            HAML ->> Cache: Give me the cached rendered UI per file
            alt Cache miss
                Cache ->> HAML: Nada!
                HAML ->> Cache: Cache rendered UI per file
                Cache ->> HAML: Cached, rendered UI per file
            else
                Cache ->> HAML: Cached, rendered UI per file
            end
            HAML ->> Back end: Rendered UI
        end
        Back end ->> Client: Respond with application layout with rendered UI
    end
  • Future files rendered and streamed to the front end
sequenceDiagram
    participant Client
    participant Back end
    participant Authorization
    participant HAML
    participant Cache
    participant Database
    participant Diff storage
    participant Gitaly

    Client ->> Back end: Stream request
    Back end ->> Authorization: Check is good request
    alt All the possible unhappy paths
        Authorization ->> Back end: No!
        Back end ->> Client: 403
    else
        Authorization ->> Back end: Authorized.
        alt MR Diff
            Back end ->> Database: Get files
            Database ->> Back end: Files
            Back end ->> Diff storage: Get diffs
            Diff storage ->> Back end: Diffs
        else
            Back end ->> Gitaly: Get diffs
            Gitaly ->> Back end: Diffs
        end
        loop Iterate through each diff file
            Back end ->> HAML: Render diff file
            HAML ->> Cache: Give me the cached rendered UI per file
            alt Cache miss
                Cache ->> HAML: Nada!
                HAML ->> Cache: Cache rendered UI per file
                Cache ->> HAML: Cached, rendered UI per file
            else
                Cache ->> HAML: Cached, rendered UI per file
            end
            HAML ->> Back end: Rendered UI
        end
        Back end ->> Client: Stream rendered UI per file
    end
Web API

The Web API provides both internal and public access to the back end implementation for diffs.

Eventually, this diagram should expand (and possibly split) to show each endpoint that our application or a user could interface with, and what each of those endpoints expects and returns.

Note that this is separate from the Back End diagrams, which elaborate on business logic and implementation details. The API endpoints are consumer-facing and so have different requirements and structures.

sequenceDiagram
    actor Web User
    participant Endpoints
    participant Back end

    Web User ->> Endpoints: Give me the diff for [x] file
    Endpoints ->> Back end: User [u] is requesting [x] diff
    Back end ->> Endpoints: Here is the resolved, rendered UI for that diff
    Endpoints ->> Web User: "Do with this diff whatever you'd like to"
A complete, single render
sequenceDiagram
actor User
participant UI
participant UX as Interaction handlers
participant FeApp as Front end behaviors
participant FeData as Data abstraction
participant FeNet as Network connectivity
participant API as Web API
participant BE as Back end
participant xxx
participant Cache
participant Database
participant Gitaly

User -->> BE: (MR page load)
BE ->> xxx: ???
xxx ->> Cache: ???
Cache ->> xxx: ???
xxx ->> Database: ???
Database ->> xxx: ???
xxx ->> Gitaly: ???
Gitaly ->> xxx: ???
xxx ->> BE: Rendered HTML
BE ->> User: A rendered diffs page for the MR

Accessibility

Reusable Rapid Diffs should be displayed in a way that is compliant with Web Content Accessibility Guidelines 2.1 level AA for web-based content and Authoring Tool Accessibility Guidelines 2.0 level AA for user interface.

We recognize that in order to have an accessible experience using diffs in the context of GitLab, we need to ensure the compliance both for displaying and interacting with diffs. That’s why the accessibility audit and further recommendation will also consider Content Editor used feature for reviewing changes.

ATAG 2.0 AA

Giving the nature of diffs, the following guidelines will be our main focus:

  1. Guideline A.2.1: (For the authoring tool user interface) Make alternative content available to authors
  2. Guideline A.3.1: (For the authoring tool user interface) Provide keyboard access to authoring features
  3. Guideline A.3.4: (For the authoring tool user interface) Enhance navigation and editing via content structure
  4. Guideline A.3.6: (For the authoring tool user interface) Manage preference settings

HTML structure

The HTML structure of a diff should have support for assistive technology. For this reason, a table could be a preferred solution as it allows to indicate logical relationship between the presented data and is easier to navigate for screen reader users with keyboard. Labeled columns will make sure that information such as line numbers can be associated with the edited piece of code.

Possible structure could include:

<table>
  <caption class="gl-sr-only">Changes for file index.js. 10 lines changed: 5 deleted, 5 added.</caption>
  <tr hidden>
    <th>Original line number: </th>
    <th>Diff line number: </th>
    <th>Line change:</th>
  </tr>
  <tr>
    <td>1234</td>
    <td></td>
    <td>.tree-time-ago ,</td>
  </tr>
  […]
</table>

See WAI tutorial on tables for more implementation guidelines.

Each file table should include a short summary of changes that will read out:

  • total number of lines changed,
  • number of added lines,
  • number of removed lines.

The summary of the table content can be placed either within <caption> element, or before the table within an element referred as aria-describedby. See WAI (Web Accessibility Initiative) for more information on both approaches:

However, if such a structure will compromise other functional aspects of displaying a diff, more generic elements together with ARIA support can be used.

Visual indicators

It is important that each visual indicator should have a screen reader text denoting the meaning of that indicator. When needed, use gl-sr-only (in conjunction with focus:gl-not-sr-only if needed) class to make the element accessible by screen readers, but not by sighted users.

Some of the visual indicators that require alternatives for assistive technology are:

  • + or red highlighting to be read as added
  • - or green highlighting to be read as removed

High-level implementation

Alternative Solutions

Historical context

Reusable Rapid Diffs introduce a paradigm shift in our approach to rendering diffs. Before this proposed architecture, we had two different approaches to rendering diffs:

  1. Merge requests heavily utilized client-side rendering.
  2. All other pages mainly used server-side rendering with additional behavior implemented in JavaScript.

In merge requests, most of the rendering work was done on the client:

  • The backend would only generate a JSON response with diffs data.
  • The client would be responsible for both drawing the diffs and reacting to user input.

This led to us adopting a virtualized scrolling solution for client-side rendering, which sped up drawing large diff file lists significantly.

Unfortunately, this came with downsides of a very high maintenance cost and constant bugs. The user experience also suffered because we couldn’t show diffs right away when you visited a page, and had to wait for the JSON response first. Lastly, this approach went completely parallel to the server-rendered diffs used on other pages, which resulted in two completely separate codebases for the diffs.

Summary of the alternative solutions attempted

Here is a list of the strategies we have adopted or simply tested in the past:

  • Full Server Side Rendering (adopted and replaced by Vue app): before the Vue refactor of the Merge Request Changes tab, diffs were fully rendered on the server. This resulted in long waits before the page started to render.
  • Frontend templates (Vue) Server Side Rendered (tested): results and impact weren’t compelling and pointed in the direction of partial SSR. (PoC MR)
  • Batch diffing (adopted): Break up the diffs into async paginated requests, increasing in size (slow start). Bootstrapping time unsatisfactory, perceived performance still involved a long time of a page without content.
  • Virtual Scrolling (adopted): several known side-effects like inability to fully use native search functionality, interferences and weird behavior while scrolling to elements, overall strain on the browser to keep reflowing and painting. (Comparison with the proposed approach in this blueprint)
  • Repository Commits details paginated if too large (adopted): As an interim solution, really large commit diffs in the repository are now paginated with negative impact in UX, hiding away files and changes through multiple pages.
  • Micro Code Review Frontend PoC (tested): This approach was significantly different from the application design used in the past, so it was never seriously explored as a way forward. Parts of this design - like custom elements and a reliance on events - have been incorporated into alternative approaches. (Micro Code Review Frontend PoC)
  • Streaming Diffs using a node server (tested): Combines streaming with a dedicated nodejs server. Percursor to the proposed SSR approach in this blueprint. (PoC: Streaming diffs app)

Proposed changes

These changes (indicated by an arbitrary name like “Design A”) suggest a proposed final path forward for this blueprint, but have not yet been accepted as the authoritative content.

  • Mark the highest hierarchical heading with your design name. If you are changing multiple headings at the same level, make sure to mark them all with the same name. This will create a high-level table of contents that is easier to reason about.

Front end (Design A)

High-level implementation

NOTE: This draft proposal suggests one potential front end architecture which may not be chosen. It is not necessarily mutually exclusive with other proposed designs.

(See New Diffs: Technical Architecture Design for nicer visuals of this chart)

flowchart TB
    classDef sticky fill:#d0cabf, color:black
    stickyMetricsA>"Metrics 3, 4, & 5 apply to<br>the entire front end application"]

    stickyMetricsA -.- fe
    fe

    Socket((WebSocket))

    be

subgraph fe [Front End]
    stickyMetricsB>"Metrics 1 & 2 apply<br>to all UI elements"]
    stickyInbound>"All data is formatted precisely<br>how the UI needs to interact with it"]
    stickyOutbound>"All data is formatted precisely<br>how the back end expects it"]
    stickyIdb>"Long-term.

    e.g. diffs, MRs, emoji, notes, drafts, user-only data<br>like file reviews, collapse states, etc."]
    stickySession>"Session-term.

    e.g. selected tab, scroll position,<br>temporary changes to user settings, etc."]

    Events([Event Hub])
    UI[UI]
    uiState((Local State))
    Logic[Application Logic]
    Normalizer[Data Normalizer]
    Inbound{{Inbound Contract}}
    Outbound{{Outbound Contract}}
    Data[Data Access]
    idb((indexedDB))
    session((sessionStorage))
    Network[Network Access]
end

subgraph be [Back End]
    stickyApi>"A large list of defined actions a<br>Diffs/Merge Request UI could perform.

    e.g.: <code>mergeRequest:notes:saveDraft</code> or<br><code>mergeRequest:changeStatus</code> (with <br><code>status: 'draft'</code> or <code>status: 'ready'</code>, etc.).

    Must not expose any implementation detail,<br>like models, storage structure, etc."]
    API[Activities API]
    unk[\"?"/]

    API -.- stickyApi
end

    %% Make stickies look like paper sort of?
    class stickyMetricsA,stickyMetricsB,stickyInbound,stickyOutbound,stickyIdb,stickySession,stickyApi sticky

    UI <--> uiState
    stickyMetricsB -.- UI
    Network ~~~ stickyMetricsB

    Logic <--> Normalizer

    Normalizer --> Outbound
    Outbound --> Data
    Inbound --> Normalizer
    Data --> Inbound

    Inbound -.- stickyInbound
    Outbound -.- stickyOutbound

    Data <--> idb
    Data <--> session
    idb -.- stickyIdb
    session -.- stickySession

    Events <--> UI
    Events <--> Logic
    Events <--> Data
    Events <--> Network

    Network --> Socket --> API --> unk

Diffs features

This is an appendix to the Reusable Rapid Diffs document.

Below is a complete list of features for merge request and commit diffs grouped by diff viewers (Code, Image, Other).

✓ – available in both MR and Commit views.

Features Code Image Other
Filename
Copy file path
Collapse and expand file
File stats
Lines changed (0 for blobs)
Permissions changed
CRUD comment on file
View file link
Mark as viewed MR MR MR
Hide all comments MR MR MR
Show full file (expand all lines) MR
Open in Web IDE link MR
Line link
Edit file link
Code highlight (multiple themes)
Expand lines
CRUD comment on specific line Commit
CRUD comment on line range MR
Draft comment on line range MR
Code quality highlights
Test coverage highlights
Hide whitespace changes
Auto-collapse large file
View as raw Commit
Side by side view
Last modified August 23, 2024: Ensure frontmatter is consistent (e47101dc)