Package:Container Registry Group

The Team

The Container Registry is part of the GitLab Package stage, which integrates with GitLab’s CI/CD product.

Who We Are

Team Members

The following people are permanent members of the Container Registry Group:

Name Role
Crystal PooleCrystal Poole Senior Engineering Manager, Package
Backend EngineerBackend Engineer Backend Engineer, Package:Container Registry
Hayley SwimelarHayley Swimelar Senior Backend Engineer, Package:Container Registry
Jaime MartínezJaime Martínez Senior Backend Engineer, Package:Container Registry
João PereiraJoão Pereira Staff Backend Engineer, Package:Container Registry
Senior Backend EngineerSenior Backend Engineer Senior Backend Engineer, Package:Container Registry
Rahul ChanilaRahul Chanila Senior Frontend Engineer, Package:Container Registry
Senior Backend EngineerSenior Backend Engineer Backend Engineer, Package:Container Registry

Stable Counterparts

The following members of other functional teams are our stable counterparts:

Name Role
Greg MyersGreg Myers Security Engineer, Application Security, Package (Package Registry, Container Registry), US Public Sector Services, Gitaly Cluster, Analytics (Analytics Instrumentation, Product Analytics), AI Working Group
Jackie PorterJackie Porter Director of Product Management, Verify & Package
Tim RizziTim Rizzi Principal Product Manager, Package

How We Work

Directly Responsible Individual (DRI)

A DRI is assigned to every substantial project or initiative the team works on. A project is considered substantial when the work involved is expected to span more than two milestones. When projects take that long to deliver, tasks such as the planning and breakdown of deliverables and regular async updates become increasingly important for the project’s success. Therefore, it makes sense to enforce the assignment of a DRI, who will be personally accountable for those tasks.

We strongly encourage everyone on the team to step forward and sign up as DRI for new projects. Ideally, all team members should experience this role over time. This promotes shared ownership, accountability and development opportunities for all team members.

In case of critical, unusually long, or highly complex projects, a specific DRI with the most experience on the subject may be assigned by the Engineering Manager. In these situations, other team members may volunteer or be assigned to shadow the assigned DRI and act as backup. This provides not only a learning opportunity for newer team members but also redundancy.

Apart from what is described in the DRI handbook page, DRIs leading projects on the team must perform the following tasks:

  • Make sure the epic that serves as single source of truth for the project is kept up to date, and so are the individual sub epics and issues under;
  • Make sure to consistently provide a weekly async update on the related epic. Low-level updates on sub-epics are optional. High-level updates on the root epic are required.
  • Ensure there is at least one issue ready to be scheduled on the next milestone;
  • Engage with the Product Manager to have the issue(s) ready for development scheduled in the next milestone;
  • Keep the Engineering Manager and Product Manager aware of any unexpected changes to the plan;
  • Consult and collaborate with other DRIs when inter project dependencies or blockers are identified;
  • Consult with other engineers when the project’s technical scope changes.

The DRI for a given project can be identified by looking at the corresponding epic’s description, where a section as follows should be added:

## Owners

* Team: [Container Registry](/handbook/engineering/development/ops/package/container-registry/)
* Most appropriate slack channel to reach out to: `#g_container-registry`
* Best individual to reach out to: <!-- GitLab handle of the DRI, or "TBD" if none has been assigned yet -->
* PM: @trizzi
* EM: @crystalpoole

Additionally, we maintain a list of active projects and the assigned DRI on this page, in What Are We Working On.

Authors of merge requests related to a specific project should request a review from the assigned DRI or backup DRI to ensure they are aware of the changes and can provide the necessary oversight.

Alert and CI flake management

The team is responsible for monitoring the Slack channel #g_container-registry_alerts where alerts and CI notifications failures are displayed for the registry service and code base (broken master). Service alerts are configured in the runbooks project and they follow the infrastructure team process to define them.

Process for handling alerts

The team has agreed on the following process to handle alerts:

  1. There is no person formally on-call (unless otherwise agreed during certain periods, e.g. end of year holidays).
  2. Everyone is responsible for keeping an eye on #g_container-registry_alerts during their working hours.
  3. When there is a new alert/CI notification:
    1. Add an 👀 emoji to the alert to signal it is being looked at.
    2. Click on an alert for details. Each alert may contain the following:
      • Runbook - how to deal with the alert.
      • Dashboard - link to Grafana that chart related to the metric that triggered the alert.
      • Pipeline that failed - broken master.
      • Sentry issue - contains stacktrace to alert origin.
    3. Use the available resources to evaluate the problem.
    4. Determine if it’s safe to ignore:
      • There is an existing issue for this alert. If so, add an occurrence of this problem in the issue description following the alert occurrence template.
      • The logs/dashboards show that the issue seems to be resolved. For example, when the Pending Tasks metric for the online garbage collector is going down after a sudden peak and there are no errors in the logs.
      • The alert has been automatically resolved.
      • Open an issue if this requires attention in the future. If the alert/CI notification is due to a flake, identify the severity of the failure and add an appropriate priority label, CC @trizzi in the issue for prioritization and @gitlab-org/ci-cd/package-stage/container-registry-group so that they are aware of the issue.
      • If this is a recurring alert that was deemed as safe to ignore, consider raising an issue to adjust the alert thresholds, CC @trizzi in the issue for prioritization and @gitlab-org/ci-cd/package-stage/container-registry-group so that they are aware of the issue.
      • If you raised or updated an issue, ensure that it has the correct labels. If the problem is due to a flaky test, then apply the ~"failure::flaky-test" label. ~"flaky-test::<type>" labels are optional but recommended. If it is due to an alert, apply the ~"container registry::alert" label. Finally, ensure that the issue has the appropriate ~"priority::N" label.
    5. Otherwise:
    6. Add a comment as a thread to the alert that you reviewed.
    7. Once the problem has been resolved or the required short-term investigation is complete, react with a ✅ emoji to the notification.
Alert Occurrence Template

Add/update this template to the alert related issue with the number of times the alert has been seen.

## Alert Occurrence Update

- **Occurrence Count**: X (previously Y)
- **Date/Time**: [Insert timestamp of occurrence]
- **Last occurrences**: [Insert slack link]
Resources

Logs

Dashboards

Other

📈 Measuring results

OKRs

We use quarterly Objectives and Key Results as a tool to help us plan and measure how to achieve Key Performance Indicators (KPIs).

Here is the standard, company-wide process for OKRs

Performance indicators

We measure the value we contribute by using performance indicator metrics. The primary metric used for the Package Registry group is the number of monthly active users or GMAU.

What Are We Working On

Below is a list of projects and initiatives that we are currently working on, along with the corresponding DRI. We work on issues by priority and projects may not have active development in every milestone. DRI engineers take responsibility for planning and delivery of upcoming work, however, issues can be assigned to any team member.

Project DRI Backup DRI
Release metadata database and online GC for self-managed installs Hayley SwimelarHayley Swimelar Jaime MartínezJaime Martínez
Container Registry: Add support for tag immutability João PereiraJoão Pereira Rahul ChanilaRahul Chanila
Database load balancing João PereiraJoão Pereira Senior Backend EngineerSenior Backend Engineer
Confidential Jaime MartínezJaime Martínez João PereiraJoão Pereira
Database background migrations Senior Backend EngineerSenior Backend Engineer João PereiraJoão Pereira
Improve the fragility and speed of tests Senior Backend EngineerSenior Backend Engineer Jaime MartínezJaime Martínez

What We’ve Recently Completed

Project Milestone Completed

Documentation

Project documentation is available here.

Last modified November 14, 2024: Fix broken external links (ac0e3d5e)