Database Lexicon - terms and definitions relating to our database

This is not a comprehensive list of all of the commonly used terms, but rather it is a list of terms that are commonly confused or conflated with other terms. In each section we will identify common phrases, define our specific usage and list external references for the term in question.

Commonly Used Terminology

This is not a comprehensive list of all of the commonly used terms, but rather it is a list of terms that are commonly confused or conflated with other terms. In each section we will identify common phrases, define our specific usage and list external references for the term in question.

Partitioning

For our purposes, when referring to partitioning we are referring to a local optimization of splitting one large table into many small tables. Additionally, this is done in a way such that the application accessing this partitioned table is mostly unaware of the changes. Queries benefit from partitioning only if they use the partitioning key as a filter.

Definition - “Partitioning refers to splitting what is logically one large table into smaller physical pieces.” Reference: PostgreSQL: Documentation: 11: 5.10. Table Partitioning

To learn more about the implementation details and performance improvements of partitioning please read the Database Team handbook entry: Database Partitioning

Sharding

The simplest definition of sharding is the process of breaking up portions of your database (e.g. large tables) into smaller tables and distributing those smaller tables across multiple servers. Where sharding can get confusing, especially with PostgreSQL, is that most definitions of sharding use the term partitioning to describe how the tables are broken down and distributed. This can lead to the terms sharding and partitioning being conflated.

For our purposes, when referring to sharding we are referring to horizontally distributing data across multiple database clusters (“shards”). Each shard only contains a subset of the overall data.

Application Sharding

Another variant of sharding is application sharding This typically refers to an architecture that implements routing logic to route data requests from the application to the correct data repository. During our discussions in the Sharding Working Group we typically referred to the Shopify Pods Architecture as an example of application sharding.

Foreign Data Wrapper (FDW)

Foreign Data Wrappers allow for accessing remote data from multiple data sources. FDW can be used to create an integrated view from your PostgreSQL instance into many different types of resources, but for our purposes we are focusing on integrating multiple PostgreSQL servers. Foreign Data Wrappers (FDW) are an implementation detail that can change your PostgreSQL partitioning implementation to what is considered a Built-in Sharding implementation.

Partitioning with FDW = Sharding

With PostgreSQL, partitioning is the first implementation step taken on the path to sharding using Foreign Data Wrappers (FDW). More information and implementation details can be found in the Database Team Handbook Entry: PostgreSQL 11 sharding with foreign data wrappers and partitioning

Last modified November 13, 2024: Move gitaly pages over to data access (c16c2006)