Monitor:Observability Group

Who we are?

The Observability group is part of the GitLab Monitor stage and builds GitLab Observability product.

Team members

Name Role
Nicholas KlickNicholas Klick Engineering Manager, Monitor:Observability
Ankit BhatnagarAnkit Bhatnagar Staff Backend Engineer, Monitor:Observability
Arun SoriArun Sori Senior Backend Engineer, Monitor:Observability
Daniele RossettiDaniele Rossetti Senior Frontend Engineer, Monitor:Observability
Jiaan LouwJiaan Louw Senior Frontend Engineer, Monitor:Product Analytics
Mat AppelmanMat Appelman Principal Engineer, Monitor
Max WoolfMax Woolf Staff Backend Engineer, Monitor:Platform Insights
Robert HuntRobert Hunt Staff Frontend Engineer, Monitor:Product Analytics

Stable counterparts

Name Role
Principal EngineerPrincipal Engineer Principal Engineer, Monitor
Ottilia WesterlundOttilia Westerlund Security Engineer, Fulfillment (Fulfillment Platform, Subscription Management), Security Risk Management (Security Policies, Threat Insights), Monitor (Observability), Plan (Product Planning), AI-powered (Duo Chat, Duo Workflow, AI Framework, AI model validation, Custom models)

Technical Architecture

Architecture Blueprints

Architecture Documentation

ClickHouse Datastore

Observability and analytics features have big data and insert heavy requirements which are not a good fit for Postgres or Redis. ClickHouse was selected as a good fit to meet these features requirements. ClickHouse is an open-source column-oriented database management system. It is attractive for these use cases because it can efficiently filter, aggregate, and sum across large numbers of rows. ClickHouse is not intended to replace Postgres or Redis in GitLab’s stack.

We initially managed our own self-hosted Clickhouse instance, but decided to migrate to Clickhouse Cloud to enable the team to move quicker by offloading maintenance and scalability to Clickhouse.

Learn more: Clickhouse Datastore Working Group

How we work?

Async Standups

We have slack-based standups (using Geekbot) on Wednesdays and retrospectives on Fridays. We use these async standups to communicate what we have accomplished, any current blockers and what we plan to work on next.

Async Updates

Every Friday, the EM provides an async update of the team’s progress, following the Ops sub-department async updates process.

These updates are published as issues in the general project.

Updates and highlights from all teams in Ops are collected automatically here, grouped by week / month / quarter.

Meetings

  • Weekly Team Sync: These are focused on organizing ongoing work or specific efforts such as rollout-outs or bigger initiatives.
  • Bi-monthly social hour: This meeting is non-work related and helps team socialize and get to know each other better.
  • Team member coffee chats: Each team member should schedule a coffee chat with all other team members rough every 4-6 weeks. Feel free to discuss work or non-work topics. If timezones are an issue find another way to connect, such as a async slack thread to checkin. The goal is to get to know your other team members on a 1:1 basis.
  • Dev Syncs: These are developer-organized sync meetings where ICs can meet and discuss technical issues or organize technical work amongst themselves without requiring the presence of a EM.

Communication

We use several Slack channels to organize ourselves:

How we do planning?

We are following the monthly milestone cadence. Work is organized into epics and assigned to the relevant milestones.

Milestone starting date is defined in gitlab.org group milestones. It changes every month, according to the new GitLab release calendar.

Milestone Planning timeline:

  • 10 days before milestone starting date: Planning draft issue is created by PM/EM, with high level milestone goals.
  • 8 days before milestone starting date: Planning draft is shared with team. Individual contributors recommend epics and issues related to these goals or carried over from previous milestones.
  • 5 days before milestone starting date: Planning is reviewed during team sync meeting.
  • On milestone starting date: Milestone goals and related epics and issues should be finalized and prioritized. All planned work can be seen on the Milestone Board Previous milestone issues are moved to the new milestone or backlog.
  • During the milestone, we analyze progress and reprioritize as needed.

How to find something to work on?

Normally at the beginning of the Milestone the EM will discuss an overview of the work and what relevant areas you will focus on. Sometimes issues will already be assigned to you before the Milestone begins.

If you are ever looking for additional issues to work on:

  1. Look at the Platform Insight Milestone board
  2. Identify an issue that is unassigned.
  3. Assign yourself to the issue.
  4. Add a workflow:in dev label to the issue
  5. If the scope or description are unclear, connect with the EM and or PM for clarification or (if feeling confident) groom the issue yourself and proceed.
  6. Begin working on the issue.
  7. Once all relevent MRs are merged, set the ~workflow::verification label.
    • Ensure any MRs do not auto-close issues. (Use Relates to #11111 rather than Closes #11111 in MR descriptions.)
  8. Verify the changes and comment on the issue which environment you used for verification, for example Verified on production.
  9. Close the issue! 🎉
  10. Repeat.

How to enable Observability Beta for a customer?

To enable access to Logs, Tracing, and Metrics Beta for a certain customer, follow this process:

For SaaS:

  • Before hand, make sure you have the right access and permissions to run ChatOps command as detailed in this page.
  • Ask customer for their top-level group name (example: gitlab-org for https://gitlab.com/gitlab-org/)
  • In #production, run the following commands to enable the feature flags for this group (replace gitlab-org by the customer’s group name):
/chatops run feature set --group=gitlab-org observability_features true

To see the list of groups that have been already enabled, you can run the following command:

/chatops run feature get observability_features

The list returns group IDs and not group names though. To know a group’s ID, browse to the group’s page (example), open the “…” menu on the top-right of the page and select “Copy group ID”. If you don’t have access to the group, ask the customer to do it.

Learn more: see related feature flag issue.

For Self-Managed:

  • not available for now

Dashboards