Marketing Analytics Data Overview and General Information
Data Modeling Best Practices
As one of the core principles of the CDT’s data modeling practices, Marketing Analytics follows the guidelines for creating Trusted Data wherever and whenever possible. Essentially, Trusted Data (TD) means that the models, at every step of the way, adhere to all coding and style guides and the information/results have been fully vetted as accurate and reliable. For you, the consumer, it means that you know the numbers and results you are viewing are correct.
Data modeling is both an artform and a science. There are numerous methodologies and guides to perform this task, and CDT has provided the Marketing Analytics team (and all other embedded data teams) with a number of very helpful tools - ranging from the SQL Style Guide, which outlines the ideal formats and functions for all SQL code in the data warehouse; to the Kimball Methodology, the framework for the ways that the various levels of models should be constructed and fit together to form the final tables ingested into Tableau for analysis.
The Kimball Methodology outlines key “stages” of table creation along the path from the raw source data to the fully ingestible mart/report tables most heavily used in Tableau and other BI tools.
- Raw/Source
- We start at the source layer, which is the system housing the data that needs to be analyzed (i.e. Salesforce, Marketo, Iterable, etc.)
- The raw fields for each object (i.e. Opportunity, Account, Contact, etc.) are ingested, formatted, and named clearly and accurately in the raw/source table.
- Prep
- The raw/source fields are next pulled into a prep layer. It is here that base-level calculations, transformations, and other logical modeling steps are performed on the raw data to prepare it for the next stage in its data journey.
- The prep layer is often the most complex of the layers, as it is the core repository for the primary logic performed on the raw data.
- Dim/Fct
- The prep models are split into two distinct tables
- Fct: This table has a single unique ID as well as numerous other joinable IDs for a given object/data set. It is here that dates, IDs, and measurements (the numbers involved in business decisions) are stored and prepared.
- Dim: This table houses a single unique ID for a given object/data set and then all of the descriptive information of a single ID-row of a fct table.
- Each ID in a fct will have a single Dim table, and each Dim table will only have a single ID within it.
- Dims are joined together by joining to the fct table that they share in common.
- The prep models are split into two distinct tables
- Mart
- The mart layer combines the dims and fct models together to become a complete whole.
- Marts have very little logic or calculations within them as they serve, primarily, to present a queryable final table for a given object or business process.
- Report (rpt)
- Report tables are built off of marts (usually). They are filtered offshoots of the whole mart data set and serve as a queryable table for a refined business question or process.
- Example: You may have a mart of all opportunities in Salesforce, while having a report/rpt of all closed won opportunities - pre-filteering the mart’s results provides a smaller and more agile data set when you know you only want a portion of the mart’s whole.
Tech Stack Overview
CDT’s and thus Marketing Analytics’ data pipeline tech stack is straightforward, efficient, and effective.
It starts with extracting the raw/source data from each source system, as shown in the CDT Sources table. The extraction happens with a variety of tools based on which tool best supports the specific source system. In Marketing we primarily rely on Stitch (SFDC) and FiveTran (Marketo and Iterable and others). The extraction tool grabs all specified columns for each synced table/object and creates a raw-level table of those fields.
These raw tables are extracted into Snowflake, the repo/hub of all data at GitLab.
The next layer is the data build tool (dbt), which is a transformation layer of the modeling process. In dbt, SQL and other code is written to further transform, refine, calculate, and concatenate various fields and logic sets to create the prep layer as mentioned above.
The finished tables that dbt creates are once again pushed to Snowflake, where the dbt-created tables will live, accessible in the CDT data catalog, otherwise referred to as the Data Warehouse.
The last step in the process is to ingest the finished dbt-created, snowflake-housed tables into our BI tool of choice, which is usually Tableau - where dashboards and charts are created from the various tables, to allow easy analyses.
Data Flow
Source System | Source Object | Source Table | Prep Table | Dim Table | Fct Table | Mart Table |
---|---|---|---|---|---|---|
SFDC | Account | sfdc_account_source | prep_crm_account | dim_crm_account | fct_crm_account | mart_crm_account |
SFDC | Bizible Attribution Touchpoint | sfdc_bizible_attribution_touchpoint_source | prep_crm_attribution_touchpoint | dim_crm_touchpoint | fct_crm_attribution_touchpoint | mart_crm_attribution_touchpoint |
SFDC | Bizible Person | sfdc_bizible_person_source | prep_crm_person | dim_crm_person | fct_crm_person | mart_crm_person |
SFDC | Bizible Touchpoint | sfdc_bizible_touchpoint_source | prep_crm_touchpoint | dim_crm_touchpoint | fct_crm_touchpoint | mart_crm_touchpoint |
SFDC | Campaign | sfdc_campaign_source | prep_campaign | dim_campaign | fct_campaign | N/A |
SFDC | Campaign Member | sfdc_campaign_member_source | prep_campaign | dim_campaign | fct_campaign | N/A |
SFDC | Contact | sfdc_contact_source | prep_crm_person | dim_crm_person | fct_crm_person | mart_crm_person |
SFDC | Event | sfdc_event_source | prep_crm_event | dim_crm_event | fct_crm_event | mart_crm_event |
SFDC | Lead | sfdc_lead_source | prep_crm_person | dim_crm_person | fct_crm_person | mart_crm_person |
SFDC | Opportunity | sfdc_opportunity_source | prep_crm_opportunity | dim_crm_opportunity | fct_crm_opportunity | mart_crm_opportunity |
SFDC | Task | sfdc_task_source | prep_crm_task | dim_crm_task | fct_crm_task | mart_crm_task |
SFDC | Users | sfdc_users_source | prep_crm_user | dim_crm_user | N/A | N/A |
Marketo | Activitiy-specific source tables | Marketo model source | prep_marketo_activity | dim_marketo_activity | fct_marketo_activity | mart_marketo_activity |
Marketo | Lead/Person | marketo_lead_source | prep_marketo_person | N/A | N/A | mart_marketo_person |
Iterable | Campaign History | iterable_campaign_history_source | prep_iterable_campaign | WIP | WIP | WIP |
Iterable | Campaign List History | iterable_campaign_list_history_source | prep_iterable_campaign | WIP | WIP | WIP |
Iterable | Campaign Metrics | iterable_campaign_metrics_source | prep_iterable_campaign_metrics | WIP | WIP | WIP |
Iterable | Campaign Suppression List History | iterable_campaign_suppression_list_history_source | prep_iterable_campaign | WIP | WIP | WIP |
Iterable | Channel | iterable_channel_source | prep_iterable_user | WIP | WIP | WIP |
Iterable | Email Link Param History | iterable_email_link_param_history_source | prep_iterable_email_template | WIP | WIP | WIP |
Iterable | Email Template History | iterable_email_template_history_source | prep_iterable_email_template | WIP | WIP | WIP |
Iterable | Event | iterable_event_source | prep_iterable_event | WIP | WIP | WIP |
Iterable | List | iterable_list_source | prep_iterable_list | WIP | WIP | WIP |
Iterable | List User History | iterable_list_user_history_source | prep_iterable_list | WIP | WIP | WIP |
Iterable | Message Type | iterable_message_type_source | prep_iterable_message_type | WIP | WIP | WIP |
Iterable | Template History | iterable_template_history_source | prep_iterable_email_template | WIP | WIP | WIP |
Iterable | User History | iterable_user_history_source | prep_iterable_user | WIP | WIP | WIP |
Iterable | User Unsubscribed Channel | iterable_user_unsubscribed_channel_source | prep_iterable_user | WIP | WIP | WIP |
Iterable | User Unsubscribed Message | iterable_user_unsubscribed_message_type_source | prep_iterable_message_type | WIP | WIP | WIP |
Marketing Data Models
All GitLab data models are grouped and categorized in snowflake/dbt through a series of folders and hierarchy. Below are the primary Marketing data models, there folders, links, and descriptions. We’ve added another lvel of organization, the Grouping
field, which is a custom grouping of the data models based on their usage and lineage. The list of groupings below, preceding the data model inventory table, will you give you a brief understanding of what each grouping represents.
Groupings
Grouping | Description |
---|---|
Event Performance | Models to calculate and analyze Event data streams. |
L2R | Models built off the Lead to Revenue model. L2R is a consolidation of SFDC PErsons (leads/contacts), Accounts, Opportunities, and Bizible Touchpoints. |
Marketo | Models built off of Marketo source data. These are often used in conjuction with the L2R and Presentation Layer Model groupings for additional fields and information. |
Presentation Layer Model | Models used in active and public Tableau Dashboards. These models are the most commonly used and presented for analysis. |
SFDC | Models built off of the SFDC Source tables. These are often used in conjuction with the L2R and Presentation Layer Model groupings for additional fields and information. |
Web | Models built to model out web traffic data (top of funnel). |
Models
ec253af0
)