CustomersDot revenue impacting error monitoring & improvements

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed shreyasagarwal aish.sub vshumilo ppalanikumar jameslopez devops fulfillment 2024-09-02

Summary

CustomersDot is a powerful platform designed for customers to seamlessly purchase and manage their subscriptions. For SaaS applications, CustomersDot communicates with GitLab.com through API calls to update subscription details in real-time. For self-managed (SM) applications and dedicated instances, it generates a license for installation on the customer’s system. The latest version enhances this process by enabling license installation through the cloud, offering greater convenience and flexibility.

CustomersDot integrates with several third-party tools, such as Zuora and Salesforce. Zuora is an end-to-end order-to-revenue SaaS platform that drives GitLab’s quoting, billing, revenue collection, revenue recognition, and subscription metrics reporting. It also supports the back-office teams responsible for these operations.

Motivation

From a SOX compliance perspective, it is crucial to identify and track any errors that occur during the purchase, update and provisioning of subscriptions within the CustomersDot application. Currently, we utilize the Provision Tracking System to monitor the provisioning process after a subscription is created or updated. However, there is a tracking gap for certain revenue recognition-impacting errors that occur before a subscription purchase or update. It is crucial to address this gap to ensure comprehensive error tracking and improve the accuracy of revenue recognition processes.

Goals

The goal is to deliver a tracking and monitoring system for revenue-impacting errors that occur during the purchase, update, renewal, and reconciliation flows. This will cover the following aspects of CustomersDot integrations (Zuora, Salesforce for Q3 & others in Q4 2025).

A spike issue for the proposal can be found at gitlab-org/customers-gitlab-com#10430.

  • Enhanced visibility for stakeholders (Accessing different systems to verify data)
  • Document edge cases
  • Automated error monitoring/Alerting system
  • Document fixes or processes for 3rd party errors (like Zuora unavailability etc)

Proposal

The Fulfillment Platform group currently handles the manual process of monitoring the described errors. Each week, an engineer is assigned to a Job monitoring issue, where they must gather a collection of errors daily by running a script. These errors are then reviewed individually to identify actionable items, such as those not related to user validation, payment method validation, or location issues. The engineer manually resolves these errors with the use of console or Admin panel.

To improve efficiency and results, we need to automate this process.

  • Develop a new system for error tracking to improve the monitoring of provisioning and revenue recognition errors.
    • Create a new database table to store error logs.
    • Add a new view in the admin panel to display and manage these errors.
  • Enhance the codebase to automatically store relevant errors in the new database.
  • Implement Slack notifications to alert the team whenever a new error is encountered.
  • Automate issue creation:
    • Generate a daily issue summarizing new errors encountered each day.
    • Generate a weekly issue summarizing all errors encountered throughout the week.
  • Set up a cron job to run daily as an audit, tracking reconciliations and renewals that failed to process, and add them to the error log.

Design and implementation details

Proposed DB schema

Syntax error in textmermaid version 11.4.1

The error_monitorings table is designed to store meaningful errors that are valuable for monitoring as they occur. Most of the columns are self explanatory. Taking an example of the current logging in google cloud.

  • code -> VALIDATION_ERROR
  • error_type -> This code is not valid. Try re-entering the code from your email.
  • message -> Subscription update failed

Currently, we are tagging error messages with fulfillment_job_monitoring within the codebase and using GCloud to look up and resolve them individually. Moving forward, the plan remains the same: we will begin by logging errors with the fulfillment_job_monitoring tag into the database.

Errors will continue to be addressed individually. As soon as an error is encountered, we will send an immediate notification to the designated Slack channel, probably through the background job, to ensure timely resolution.

Right now, most of the errors being tagged are considered noise. In the future, once the error list is addressed, unnecessary noise will be filtered out and not saved in the database.

Error states

The status column stores the state of the error encountered.

Save it in database

Nothing needs to be done

Error being looked into

Issue Closed

Error encounterd

State - Needs Attention

Ignored

State - In Progress

Create/Link GitLab Issue, optional

State - Resolved

Workflow

GitLabSlackError reporting SystemZuoraCustomersDotGitLabSlackError reporting SystemZuoraCustomersDotparCustomerEngineerInitiates subscription purchase/update1API callback for subscription purchase/update2Returns error3Indicates system error4Creates error_monitoring record within database5Feeds Slack notification6Visits admin page & debug the error7Creates GitLab issue (optional)8Changes Status of error_monitoring record9Triage issue and status update10Auto-updates error_monitoring record on resolution11CustomerEngineer

Places of interest to add entries within error_monitorings table?

  1. When a customer is purchasing a subscription.
  2. When a customer is updating a subscription.
  3. When a customer upgrades the subscription.
  4. When syncing the product catalog to local cache
  5. When the reconciliation is being performed
  6. When Salesforce entities are getting created/updated
  7. When the subscription is getting auto renewed