Data Guide to People Data Analysis

Objectives for this page

This handbook page provides an overview of data definitions and data models used by the People Analytics team. Use this page for clarification on terminology, calculations, or models that use People data. If you have any questions please don’t hesitate to reach out to someone from the team for help.

Helpful places to start

  • DBT Docs - This resource contains comprehensive documentation on all available dbt models. This is a great starting point to understanding our models. For specific People Models, please reference the Commonly Used Data Models section below for a starting point.
  • Definitive guides to data subject areas managed by the Data team.
  • Documentation on data pipelines for the technically curious analyst. This page goes into each data source and extraction details.
  • People Group Tech Stack Guide for overview of all the integrations that go into and out of our HR systems and all the tools we use.

People Group Data Dictionary

Employment Data

Data pertaining to the worker and employment at GitLab once hired

Click to expand
Name Description Calculation Is Sensitive
Team Member A person that worked for GitLab at any point in time, has an employee id in one of our Human Resource Management Systems (HRMS), and is not a temporary (contingent) worker or intern
Employee ID Unique ID assigned in our HRMS upon hire to track team member’s employment with GitLab Yes
Hire Date The first day of employment with GitLab as a team member
Termination Date The first day a team member is no longer employed with GitLab as a team member termination effective date in HRMS + 1 day Yes, if future-dated
Employee Type The employment type for a team member’s primary position. We typically exclude “Consultants” and “Interns” from reporting
Company / Entity Company used to employ or contract team members that is location-specific. These can be in the form of entities or professional Employer Organizations (PEO)
Termination Type Business process reason category associated with the termination business process. Can be either Voluntary or Involuntary Yes
Exit Impact If worker is specified as regrettable termination on the termination transaction they are considered a “regrettable” termination. Otherwise they are considered a “non-regrettable” termination Yes
Headcount The total number of team members employed at GitLab on a particular date
Begin Headcount Used for Average Headcount calculation. The number of team members active at the beginning of each period in a date range. This is usually the first day of the month.
End Headcount Used for Average Headcount calculation. The number of team members active at the end of each period in date range. This is usually the last day of the month.
Average Headcount The average number of team members in a specified date range. This is used to standardize rates across business areas. (sum of all Begin Counts in period / # of periods observed + sum of all End Counts in period / # of periods observed) / 2
Net Growth Metric to show growth based only on the difference between hires and terminations in a selected period # of Hires in period - # of Terminations in period
Turnover / Attrition The percentage of employees who leave an organisation over a specific period, typically on a rolling 12 month basis # of Terminations / Average Headcount Yes, Internal Only
Retention The percentage of employees that remain with an organisation over a given period. 1 - Turnover Yes, Internal Only
Voluntary Turnover Portion of turnover that is Voluntary termination type # of Voluntary Terminations / Average Headcount Yes, Internal Only
Involuntary Turnover Portion of turnover that is Involuntary termination type # of Involuntary Terminations / Average Headcount Yes, Internal Only
Regrettable Turnover Portion of turnover that is regrettable terminations # of Regrettable Terminations / Average Headcount Yes, Internal Only
Non-Regrettable Turnover Portion of turnover that is non-regrettable terminations # of Non-Regrettable Terminations / Average Headcount Yes, Internal Only

Diversity, Inclusion, and Belonging (DIB)

Data pertaining to the demographics of a worker and how they choose to identify

Click to expand
Name Description Calculation Is Sensitive
Underrepresented Group (URG) While there are many types of underrepresented groups (URGs), in reporting URGs are based on ethnicity and we consider any ethnicity that is not White, Asian, or Did Not Identify as URG. This metric is now measured globally instead of country-specific. Yes
Non-URG In reporting, Non-URG is considered White or Asian ethnicities Yes
Gender The gender of the worker which is used in reporting. Can currently be marked as Male, Female or left unindentified Yes
Gender Identity The gender identification of the individual. Examples include transgender, cisgender, genderfluid, genderqueer, non-binary Yes

Position

Information related to the position a team member holds at GitLab

Click to Expand
Name Description Calculation Is Sensitive
Management Level The management level for the job profile.
Leadership A grouping of management levels that are director or higher (Director, Vice President, Leader, Chief Executive Officer)
Management A grouping of management levels that are manager or higher (Manager, Director, Vice President, Leader, Chief Executive Officer)
Job Grade The job level for the job profile. Yes
People Managers Any team members that have direct reports assigned to them in Workday
Layers The number of layers between a team member and the CEO where layer 1 is the CEO
Direct Reports The group of team members that report to a people manager
Span of Control The number of direct reports that are assigned to a people manager
Average Span of Control The number of direct reports on average assigned to people managers # of Direct Reports / # of People Managers
Location Factor Location Factor is calculated using multiple data sources to conduct a market analysis of compensation rates globally: Economic Research Institute (ERI), Comptryx, Radford. This is not a cost of living analysis, but instead a cost of labor market evaluation compared to San Francisco Yes
Compa-Ratio Compa-ratio is a measurement of pay that compares an employee’s salary to the median compensation for similar positions within a company or a target market. Divide the employee’s salary by the market rate compensation midpoint Yes
Discretionary Bonus An additional bonus amount that is awarded at the discretion of the company management. It is separate from standard bonus plans and is not based on specific performance metrics. Discretionary bonuses allow companies flexibility to reward standout employees.
Discretionary Bonus Rate Percentage of Team members that receive a discretionary bonus from the population. For KPIs this is reported as a rolling 3 month average. # of Discretionary Bonuses / Average Headcount
Promotion A promotion is counted if the business process type submitted in our HRMS is “Promote Employee Inbound”
Lateral Transfer A lateral move is counted if the business process category submitted in our HRMS is “Lateral Move” and there was a change in job title
Internal Mobility Rate Internal Mobility Rate refers to the frequency at which employees change positions within an organization, indicating the internal movement and career progression opportunities available to staff, as well as the company’s ability to retain talent by offering growth paths. # Lateral Transfers and Promotions / Average Headcount

Information related to Paid Time Off and Absence

Click to expand
Name Description Calculation Is Sensitive
Paid Time Off (PTO) Time off that is submitted and approved in the Absence Management System, does not fall on a weekend, does not exceed 25 days in a single request, and excludes PTO types like Extended Leave, CEO Shadow Program, and Conference.
Out Sick - Extended PTO Type PTO submission that is listed as “Out Sick” but is for 5+ days so is reclassified in reporting
Average PTO Days Average number of days in period that are PTO days. This calculation normalizes PTO usage for all team members to account for those who have not worked all days in the reporting period. (# PTO Days Used / # Days Worked) * # Days in Period
Full-Cycle Tenure Group Team members who have worked all days in a reporting period so normalized PTO matches actual PTO used
In/Out Tenure Group Team members who have not worked all days in a reporting period so normalized PTO will reflect number of days used for the full period based on actual usage in that period

Recruiting and Onboarding

Data pertaining to the Onboarding process and Recruiting metrics used by Talent Acquisition

Click to expand
Name Description Calculation Is Sensitive
Onboarding Satisfaction (OSAT) Score Average rating on a 1-5 scale for all questions in the Week One onboarding survey in Culture Amp. Usually calculated as a rolling 3-month average. (Average of All Month 1 Onboarding Responses + Average of All Month 2 Onboarding Responses + Average of All Month 3 Onboarding Responses) / 3
Starts External applicants who have accepted an offer with GitLab and have a start date scheduled in the HRMS
External Applicants Applicants whose source is not “Internal Applicant”. They are not employed through GitLab at the time when the application is submitted
Internal Applicants Applicants whose source is “Internal Applicant”. They are employed through GitLab at the time when the application is submitted
Time to Hire / Candidate Throughput Days between the application submitted date and the date an application moved in the “Hire” stage if they also have an accepted offer.
Verbal Accepted Offer % Applications that moved into the Offer stage and accepted / # Moved into the Offer stage and resolved # of Applications moved into Offer Stage and Accepted Offers / # of Applications moved into the Offer Stage and Accepted or Rejected Offers
Written Offer Accept % Offers that resolved in a date range. # of accepted offers / # of resolved offers. # of Applications that Accepted Offers / # of Applications that Accepted or Rejected Offers
Internal Mobility Rate % Offers that resolved in a date range. # of accepted offers from internal applicants / # of accepted offers # of Internal Applications that Accepted Offers / # of Applications that Accepted Offers
Source Rate % of applications submitted in a date range that were sourced (Sourcing Strategy is “Prospecting” and Source is not “SocialReferral”) and candidate was from a diverse source (based on candidate tags) as a percentage of all sourced applications # of Sourced Applications submitted with Diverse Candidate Tags / # of Sourced Applications
Source Mix % of accepted offers by external sourcing strategies (not internal applicants or acquisitions) with a resolved date in date range # of External Applications with Accepted Offers in Specific Sourcing Strategy / # of External Applications with Accepted Offers
Conversion Rate # of Applications submitted in a date range and what stages of the recruiting process that have gone through. It is the flow of applications between stages of the recruiting process showing what percent of applications from the previous step move into the next step (ex- if 10 applications were in Initial Interview and 2 made it to Team Interview phase, conversion rate for the Initial Interview would be 20%) # of Applications in Second Stage / # of Applications in First Stage
Pending Offers Offers that have been sent but not accepted or rejected
Time to Fill Days between an Opening opened date and closed date for Openings that closed in a specified date range and list a hired application
Time to Start Days between an Opening opened date and the expected start date for a hired application
Requisition Aging Average days that Openings were open at any point in a date range. This calculation averages days open for all days in the reporting period.

Systems

Systems used by the People Group

Click to expand
Name Description Calculation Is Sensitive
R (Language) A programming language and software environment used for statistical analysis, graphical representation, and reporting, popular in data analysis and scientific research.
Tableau Primary data visualization tool at GitLab
Workday Current Human Resource Management Systems (HRMS) and the SSoT for all team member related data on or after 2022-06-16
BambooHR Former Human Resource Management Systems (HRMS) and the SSoT for all team member related data prior to 2022-06-16
Greenhouse Current Applicant Tracking System (ATS) at GitLab and the SSOT for all recruiting metrics
Time Off by Deel Current absence management tool at GitLab integrated with BambooHR. SSoT for time-off related data
Sisense (formerly Periscope) Former data visualization tool at GitLab
FiveTran Fivetran is the automated data movement platform moving data out of Workday and into Snowflake
SnowFlake Snowflake is our Enterprise Data Warehouse (EDW) and is the core technology in our Enterprise Data Platform.
Level Up / GitLab University Training platform created by Thought Industries for continuing education and career development

Data Models

The Data Team is working with the People Analytics team to build data marts that allow users to explore our different people data sources

  • “mart” models are a combination of dimensions and facts that are joined together to enable easy analysis.
  • “rpt” (“report”) models are built with specific business logic for a specific use case.

Underneath each model is a clean lineage of dimensions and facts that can also be used for analysis. This list included all the prep tables, dimensions and facts that have been created for people analysis as well as the ones that we are planning to build in the upcoming quarters based on the Team Member Common ERD

Prep, dimension and fact tables

Model Name Table Type Grain Source Status Documentation
prep_team_member Staging One row per Team Member ID per event Workday Completed DBT docs
dim_team Dimension One row per Team ID per event Workday Completed DBT docs
dim_team_member Dimension One row per Team Member ID per event Workday Completed DBT docs
fct_team_member_position Fact One row per employee_id, team_id, effective_date and date_time_initiated combination Workday Completed DBT docs
fct_team_member_status Fact One row per employee_id, employment_status and status_effective_date combination Workday Completed DBT docs
fct_team_status Fact One row per employee_id and valid_from combination Workday Completed DBT docs
fct_team_member_absence Fact One row per Team Member ID, pto_uuid and absence_date combination Time Off By Deel Completed DBT docs
fct_team_member_locality Fact Workday Planned DBT docs
fct_team_demographic Fact Workday Planned DBT docs

Marts

Model Name Table Type Grain Status Documentation
mart_team_member_directory Mart One row per employee ID Completed DBT docs
mart_team_member_absence Mart One row per Team Member ID, pto_uuid and absence_date combination Completed DBT docs

Model usage

dim_team_member

This table contains team members work and personal information. Sensitive columns are masked using dynamic masking and the fields are only visible by team members with the analyst_people role assigned in Snowflake. This table is a hybrid SCD (Type 1 + Type 2).

The table includes information regarding current team members, new hires who have records created in Workday before their start date and team members who were terminated in 2021 onwards. Team members who were terminated before 2021 are not captured in this model at this time. The grain of this table is one row per employee_id per valid_to/valid_from combination.

Query - Team members count per region
SELECT
  region,
  COUNT(DISTINCT employee_id)
FROM
  PROD.COMMON.DIM_TEAM_MEMBER
WHERE
  is_current = TRUE AND is_current_team_member = TRUE
GROUP BY
  region
Query - Current team members total count
SELECT
  COUNT(DISTINCT employee_id)
FROM
  PROD.COMMON.DIM_TEAM_MEMBER
WHERE
  is_current = TRUE AND is_current_team_member = TRUE
Query - Percentage of key talent in the team member population

key_talent_status is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  key_talent_status,
  COUNT(*) * 100 / SUM(COUNT(*)) OVER() AS key_talent_percentage
FROM
  PROD.COMMON.DIM_TEAM_MEMBER
WHERE
  is_current = TRUE AND is_current_team_member = TRUE
GROUP BY 1

dim_team

dim_team contains team (organizations) information. It includes information regarding teams and their hierarchy. It is a Type 2 SCD.

The goal of this table is to determine the team superior organization and the hierarchy of every organization in Workday. Additionally, this table provides team data such as team name, manager, team member count and date of inactivation for each organization.

The grain of this table is one row per Team ID per event.

Query - Total count of active organizations
SELECT
  COUNT(*)
FROM
  PREP.SENSITIVE.DIM_TEAM
WHERE
  is_current = TRUE AND is_team_active = TRUE
Query - Count of current team members
SELECT
  SUM(team_members_count)
FROM
  PREP.SENSITIVE.DIM_TEAM
WHERE
  is_current = TRUE AND is_team_active = TRUE

fct_team_member_position

fct_team_member_position contains team members’ job history, including any changes in their job profile or team. It provides a history of the team member’s job profile in detail. The grain of this table is one row per employee_id, team_id, effective_date and date_time_initiated combination. It includes all team members, regardless of their current employment status.

Query - Number of employees per entity

Entity is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  entity,
  COUNT(*)
FROM
  PROD.COMMON.FCT_TEAM_MEMBER_POSITION
WHERE
  is_current = true
GROUP BY 1
Query - Number of employees per position/role
SELECT
  position,
  COUNT(*)
FROM
  PROD.COMMON.FCT_TEAM_MEMBER_POSITION
WHERE
  is_position_active
    AND is_current
GROUP BY 1
Query - Number of employees with a specific job specialty
SELECT
  COUNT(*)
FROM
  PROD.COMMON.FCT_TEAM_MEMBER_POSITION
WHERE
  (job_specialty_single LIKE '%ModelOps%' OR job_specialty_multi LIKE '%ModelOps%')
    AND is_current;
Query - Team members with position = backend engineers in France

Entity is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  *
FROM
  PROD.COMMON.FCT_TEAM_MEMBER_POSITION
WHERE
  position LIKE '%Backend Engineer%'
    AND entity = 'GitLab France S.A.S.'
     AND is_current

fct_team_member_status

This table contains termination reason, type, exit impact and employment status. Sensitive columns are masked and only visible by team members with the analyst_people role assigned in Snowflake. This table contains only past terminations.

The grain of this table is one row per employee_id, employment_status and status_effective_date combination.

Number of employees whose departure caused an impact to the company

Exit impact is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  COUNT(*)
FROM
  PROD.COMMON.fct_team_member_status
WHERE
  exit_impact = 'Yes';
Number of employees whose departure was voluntary

Termination type is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  COUNT(*)
FROM
  PROD.COMMON.fct_team_member_status
WHERE
  termination_type = 'Resignation (Voluntary)';
Termination reason provided by employees whose departure was voluntary and caused an exit impact

Termination type, termination reason and exit impact are masked fields, only team members with the analyst_people role in Snowflake can query it

SELECT
  DISTINCT termination_reason
FROM
  PROD.COMMON.fct_team_member_status
WHERE
  termination_type = 'Resignation (Voluntary)'
    AND exit_impact = 'Yes';

fct_team_status

This table is a derived fact from fct_team_member_status and fct_team_member_position. Sensitive columns are masked and only visible by team members with the analyst_people role assigned in Snowflake. This table only contains one change in the team member’s position per effective date, as opposed to the fct_team_member_position table which contains all changes to a team member’s position profile, regardless of whether they became effective or not. This table doesn’t include future hires, only people working at GitLab as of today’s date.

Query - Active team members with position = backend engineers in France

Entity is a masked field, only team members with the analyst_people role in Snowflake can query it

SELECT
  *
FROM
  PROD.COMMON.fct_team_status
WHERE
  position LIKE '%Backend Engineer%'
    AND entity = 'GitLab France S.A.S.'
      AND employment_status = 'Active'
        AND is_current
Query - Number of resignations (voluntary) per team

Termination type and termination reason are masked fields, only team members with the analyst_people role in Snowflake can query it

SELECT
  team_id,
  COUNT(*)
FROM
  PROD.COMMON.fct_team_status
WHERE
    employment_status = 'Terminated'
       AND termination_type = 'Resignation (Voluntary)'
           AND is_current
GROUP BY 1;
Query - Number of regresignations (voluntary vs involuntary) per management level

Termination type and termination reason are masked fields, only team members with the analyst_people role in Snowflake can query it

SELECT
  management_level, COUNT(*)
FROM
  PROD.COMMON.fct_team_status
WHERE
    employment_status = 'Terminated'
       AND termination_type = 'Resignation (Voluntary)'
          AND management_level != 'Individual Contributor'
           AND is_current
GROUP BY 1;

fct_team_member_absence

This table contains team members’ absence information. Sensitive columns are masked using dynamic masking and the fields are only visible by team members with the analyst_people role assigned in Snowflake. This table is a Type 0 SCD

The table includes information from Time Off By Deel, . The grain of this table is one row per team member ID per pto_uuid and absence_date combination.

mart_team_member_absence

This table is a derived mart from fct_team_member_absence and dim_team_member. Sensitive columns are masked and only visible by team members with the analyst_people role assigned in Snowflake. This table will be a replacement of the workspace table wk_pto.This table is a hybrid SCD (Type 0 + Type 2).

The grain of this table is one row per team member ID per pto_uuid and absence_date combination.

Query - Absence days for an employee by quarter for the year 2024
WITH final AS (
  SELECT
    *,
    DATEDIFF(DAY, absence_start, absence_end) + 1 AS pto_days_requested,
    ROW_NUMBER() OVER (
      PARTITION BY
        employee_id,
        absence_date
      ORDER BY
        absence_end DESC,
        pto_uuid DESC
    )                                             AS pto_rank
  FROM prod.common_mart_people.mart_team_member_absence
  WHERE absence_date <= CURRENT_DATE
    AND pto_days_requested <= 25
    AND COALESCE(pto_group_type, '') != 'EXL'
    AND NOT COALESCE(absence_status, '') IN ('CEO Shadow Program', 'Conference', 'Customer Visit')
  QUALIFY pto_rank = 1
)

SELECT
  QUARTER(absence_date) AS quarter,
  absence_status,
  COUNT(absence_date)   AS absence_count
FROM final
WHERE full_name = 'John Doe'
  AND YEAR(absence_date) = 2024
GROUP BY QUARTER(absence_date), absence_status
ORDER BY quarter ASC, absence_count DESC;

mart_team_member_directory

This table is a derived fact from fct_team_member_position and dim_team. Sensitive columns are masked and only visible by team members with the analyst_people role assigned in Snowflake. This table will become a replacement of the legacy tables employee_directory_* once all the BambooHR data has been included in the upstream tables.

The grain of this table is one row per employee per valid_from/valid_to combination.

Average location factor by division
SELECT
    directory.division,
    DATE_TRUNC('month', dates.date_actual) AS month,
    ROUND(AVG(location_factors.location_factor),2) AS average_location_factor
  FROM PROD.COMMON_MART_PEOPLE.MART_TEAM_MEMBER_DIRECTORY AS directory
  INNER JOIN PROD.LEGACY.DATE_DETAILS AS dates
    ON dates.date_actual >= directory.valid_from
        AND dates.date_actual < directory.valid_to
  LEFT JOIN PREP.SENSITIVE.EMPLOYEE_LOCATION_FACTOR_SNAPSHOTS AS location_factors
    ON REPLACE(location_factors.bamboo_employee_number,',','') = directory.employee_id
        AND NOT (directory.valid_to <= location_factors.valid_from
          OR directory.valid_from >= location_factors.valid_to)
  WHERE location_factors.location_factor IS NOT NULL
      AND directory.is_current AND directory.is_current_team_member
  GROUP BY 1, 2
Tenure bucket per team member
SELECT
  employee_id,
  hire_date,
  DATEDIFF(day, hire_date, CURRENT_DATE())           AS tenure_in_days,
  CASE WHEN tenure_in_days BETWEEN 0 AND 183 THEN '0 - 6 Months'
       WHEN tenure_in_days BETWEEN 184 AND 365 THEN '6 - 12 Months'
       WHEN tenure_in_days BETWEEN 366 AND 1095 THEN '1 - 3 Years'
       WHEN tenure_in_days BETWEEN 1096 AND 2190 THEN '4 - 6 Years'
       WHEN tenure_in_days > 2191 THEN '6+ Years'
       ELSE null
   END                                                  AS tenure_bucketed
FROM PROD.COMMON_MART_PEOPLE.MART_TEAM_MEMBER_DIRECTORY
WHERE is_current AND is_current_team_member

Legacy and Other Data Models

Legacy models are models we will be transitioning from at some point but are still being used for reporting.

Legacy Workday/BambooHR Data Models

Click to expand
Database Schema Table Name Data Grain Description Notes
prod legacy employee_directory_analysis employee_id by date_actual Gives the current state of the employees at GitLab at any point of time. This is the model to use for headcount, team size, or any people-related analysis for employees. This has current and past employees, as well as their department, division, and cost center and hire/termination dates.
prod legacy bamboohr_rpt_headcount_aggregation department, division, eeoc_value This report creates out a headcount report from the bamboohr_headcount_intermediate to be used for Sisense dashboards for each month. The division reporting is based on current division used.
prep sensitive employee_directory_intermediate employee_id by date_actual INCLUDES SENSITIVE INFORMATION. The master collection of all info about GitLab employees for their latest position.
prep sensitive bamboohr_employment_status_xf employee_id by valid_from_date This model provides a transaction record of an employee’s status changes (i.e. active, leave, termed). It helps identify when an employee is re-hired, and provides termination type
prep sensitive bamboohr_promotions_xf employee_id by promotion_date and compensation_sequence This model identifies all individuals that were promoted and the compensation change associated to the promotion. The total compensation change is equal to the change in compensation (from bamboohr_compensation model) times the pay frequency and currency conversion at time of promotion + change in OTE(USD) at time of promotion. In the case the team member is hourly, we use the bamboohr_currency_conversion table.
prep sensitive bamboohr_id_employee_number_mapping employee_id This model is the canonical source mapping bamboo employee numbers with employee IDs. It includes all employees for all time. The model also captures demographic information, and removes test accounts.
prep sensitive bamboohr_separations employee_id Provides a report of all separated team members.
prep sensitive workday_terminations employee_id Provides the termination reason, and exit impact to allow the People Analytics team to accurately report on termination data
prep workday blended_directory_source employee_id by uploaded_at and source_system Daily upload of employee data used for downstream models. Helpful source for auditing any data issues in Snowflake

Greenhouse Data Models

Click to expand
Database Schema Table Name Data Grain Description Notes
prod workspace_people rpt_hires application_id This is a report specifically for talent acquisition and counts accepted offers as hires.
prep greenhouse greenhouse_application_stages_source application_id by stage_id and stage_entered_on This table is Historical activity of all stages an application can be in Each row represents a stage that an application can be in, and the timestamp that the application entered and exited the stage Things to note: This table contains a row for each stage that an application can be in (taken from the job that the application is on). Thus, there may be rows for stages that an application has yet to reach, or will not reach (if the application was rejected).
prep sensitive greenhouse_recruiting_xf application_id This is shows all applications submitted and ties in offer data, greenhouse department (the associated division), and source data to be in 1 place.
Last modified September 23, 2024: Fix broken links (d748cf8c)