GitLab.com SAAS Data Pipeline
The GitLab.com SAAS Data Pipeline Working Group
Attributes
Property | Value |
---|---|
Date Created | 2022-09-30 |
Target End Date | 2023-10-01 |
Slack | #wg_gitlab_to_snowflake_pipeline (only accessible from within the company) |
Google Doc | Working Group Agenda (only accessible from within the company) |
Issue Board | Issue Board |
Business Goal
Provide reliable, stable, scalable and secure data feed for reporting and analytics use cases like MR Rate & Performance KPIs (Product development), Product Usage for Gainsight, License Utilization (Customer Success), Product Health Dashboard (Health) Propensity to Expand and Churn Models, Forecast, Territory Management (Sales).
Technical Goal
Implement a data pipeline that captures create
, update
and delete
operations and stores them in the GitLab Data platform to be available for further processing and analytics.
Exit Criteria
- Reliable: All ‘metadata’ and relevant descriptive data (
create
,update
anddelete
) operations in the PostgreSQL database are captured, transfered and loaded into the data platform within 6 hours of the original event.Partially achieved
.- Background migrations are not captured due to the fact that
updated_at
column not gets updated.
- Background migrations are not captured due to the fact that
- Stable: End to end solution runs for 30 days without interruption or human intervention.
Achieved
- Scalable: Solution scales with GitLab.com and does not require undue burden to expand as new GitLab.com databases, shards, zones, tenants, or cloud locations are deployed
Partially achieved
- Partially achieved as this is not possible with the existing architecture. With changing architecture this is included as requirements.
- Secure: Red data is not loaded or it is hashed if stored on a system that is not a data sub-processor.
Achieved
- Defined and agreed SLO with all business partners.
Achieved
- Full end to end monitoring.
Achieved
Roles and Responsibilities
Working Group Role | Person | Title |
---|---|---|
Executive Stakeholder | David Sakamoto | VP, Customer Success |
Working Group DRI / Facilitator | Dennis van Rooijen | Senior Manager Data |
Functional Lead Data | Ved Prakash | Staff Data Engineer |
Functional Lead Infra | ||
Member | t.b.d. | Senior Data Engineer Big Data |
Member | Justin Wong | Data Engineer |
Member - Infra | Gerardo Lopez-Fernandez | Engineering Fellow, Infrastructure |
Member - Database Reliability | Alexandar Sosna | Senior Database Reliability Engineer |
Member - Product | ||
Member - Security | Kyle Smith | Sr. Security Risk Engineer |
Member - Development | Nick Nguyen | Senior EM, Enablement: Data Stores |
Last modified July 9, 2024: Fix links and spelling (
e30f31b6
)