CustomersDot revenue impacting error monitoring & improvements

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed shreyasagarwal aish.sub vshumilo ppalanikumar jameslopez devops fulfillment 2024-09-02

Summary

CustomersDot is a powerful platform designed for customers to seamlessly purchase and manage their subscriptions. For SaaS applications, CustomersDot communicates with GitLab.com through API calls to update subscription details in real-time. For self-managed (SM) applications and dedicated instances, it generates a license for installation on the customer’s system. The latest version enhances this process by enabling license installation through the cloud, offering greater convenience and flexibility.

CustomersDot integrates with several third-party tools, such as Zuora and Salesforce. Zuora is an end-to-end order-to-revenue SaaS platform that drives GitLab’s quoting, billing, revenue collection, revenue recognition, and subscription metrics reporting. It also supports the back-office teams responsible for these operations.

Motivation

From a SOX compliance perspective, it is crucial to identify and track any errors that occur during the purchase, update and provisioning of subscriptions within the CustomersDot application. Currently, we utilize the Provision Tracking System to monitor the provisioning process after a subscription is created or updated. However, there is a tracking gap for certain revenue recognition-impacting errors that occur before a subscription purchase or update. It is crucial to address this gap to ensure comprehensive error tracking and improve the accuracy of revenue recognition processes.

Goals

The goal is to deliver a tracking and monitoring system for revenue-impacting errors that occur during the purchase, update, renewal, and reconciliation flows. This will cover the following aspects of CustomersDot integrations (Zuora, Salesforce for Q3 & others in Q4 2025).

A spike issue for the proposal can be found at gitlab-org/customers-gitlab-com#10430.

  • Enhanced visibility for stakeholders (Accessing different systems to verify data)
  • Document edge cases
  • Automated error monitoring/Alerting system
  • Document fixes or processes for 3rd party errors (like Zuora unavailability etc)

Proposal

The Fulfillment Platform group currently handles the manual process of monitoring the described errors. Each week, an engineer is assigned to a Job monitoring issue, where they must gather a collection of errors daily by running a script. These errors are then reviewed individually to identify actionable items, such as those not related to user validation, payment method validation, or location issues. The engineer manually resolves these errors with the use of console or Admin panel.

To improve efficiency and results, we need to automate this process.

  • Develop a new system for error tracking to improve the monitoring of provisioning and revenue recognition errors.
    • Create a new database table to store error logs.
    • Add a new view in the admin panel to display and manage these errors.
  • Enhance the codebase to automatically store relevant errors in the new database.
  • Implement Slack notifications to alert the team whenever a new error is encountered.
  • Automate issue creation:
    • Generate a daily issue summarizing new errors encountered each day.
    • Generate a weekly issue summarizing all errors encountered throughout the week.
  • Set up a cron job to run daily as an audit, tracking reconciliations and renewals that failed to process, and add them to the error log.

Design and implementation details

Proposed DB schema

erDiagram

  "ErrorMonitorings" {
    integer id PK
    string message "null:false"
    string code
    string error_type
    string status "null:false"
    string gitlab_issue_link
    text backtrace
    jsonb payload
    text notes
    datetime created_at "null:false"
    datetime updated_at "null:false"
  }

The error_monitorings table is designed to store meaningful errors that are valuable for monitoring as they occur. Most of the columns are self explanatory. Taking an example of the current logging in google cloud.

  • code -> VALIDATION_ERROR
  • error_type -> Subscription update failed
  • message -> This code is not valid. Try re-entering the code from your email.

Currently, we are tagging error messages with fulfillment_job_monitoring within the codebase and using GCloud to look up and resolve them individually. Moving forward, the plan remains the same: we will begin by logging errors with the fulfillment_job_monitoring tag into the database.

Errors will continue to be addressed individually. As soon as an error is encountered, we will send an immediate notification to the designated Slack channel, probably through the background job, to ensure timely resolution.

Right now, most of the errors being tagged are considered noise. In the future, once the error list is addressed, unnecessary noise will be filtered out and not saved in the database.

Error states

The status column stores the state of the error encountered.

flowchart TD
    A[Error encounterd] -->|Save it in database| B(State - open)
    B --> |Nothing needs to be done| D[Resolved]
    B --> E[State - Needs attention]
    E --> |Needs to look into the error| C[Create/Link GitLab Issue, optional]
    C --> |Issue being worked on| F[State - In Progress]
    F --> |Issue Closed| D

Workflow

sequenceDiagram
  autonumber
  actor C as Customer
  participant CD as CustomersDot
  participant Z as Zuora
  participant ER as Error reporting System
  participant S as Slack
  participant G as GitLab
  actor E as Engineer

  C->>CD: Initiates subscription purchase/update
  CD->>Z: API callback for subscription purchase/update
  Z-->>CD: Returns error
  CD-->>C: Indicates system error
  CD->>ER: Creates error_monitoring record within database
  ER->>S: Feeds Slack notification
  E->>CD: Visits admin page & debug the error
  E->>G: Creates GitLab issue (optional)
  E->>ER: Changes Status of error_monitoring record
  par
    E->>G: Triage issue and status update
    G->>ER: Auto-updates error_monitoring record on resolution
  end

Places of interest to add entries within error_monitorings table?

  1. When a customer is purchasing a subscription.
  2. When a customer is updating a subscription.
  3. When a customer upgrades the subscription.
  4. When syncing the product catalog to local cache
  5. When the reconciliation is being performed
  6. When Salesforce entities are getting created/updated
  7. When the subscription is getting auto renewed
Last modified October 11, 2024: Rename table to error_monitorings (30c17789)