Environments Group

The Environments group is responsible for the Environments in the Deploy stage of the DevOps lifecycle.

Vision

For an understanding of where this team is going, take a look at the product vision.

As a member of the Ops Sub-department, you may also like to understand our overall vision.

Mission

OKRs

Product Indicators

Contribution to GitLab

Team Members

Name Role
Nicolò Maria MezzoperaNicolò Maria Mezzopera Fullstack Engineering Manager, Deploy:Environments
Anna VovchenkoAnna Vovchenko Senior Frontend Engineer, Deploy:Environments
Staff Backend EngineerStaff Backend Engineer Staff Backend Engineer, Deploy:Environments
Taka NishidaTaka Nishida Senior Backend Engineer
Tiger WatsonTiger Watson Senior Backend Engineer, Deploy:Environments
Timo FurrerTimo Furrer Senior Backend Engineer

Stable Counterparts

The following members of other functional teams are our stable counterparts:

Name Role
Ameya DarshanAmeya Darshan Security Engineer, Deploy:Environments, Systems:Gitaly, Systems:Geo, Delivery, Snippets
Emily BaumanEmily Bauman Senior Product Designer, Deploy:Environments
Phillip WellsPhillip Wells Technical Writer, Deploy:Environments
Viktor NagyViktor Nagy Senior Product Manager, Deploy:Environments

Some dedicated Slack channels:

Insights

Processes

Acronyms

  • Engineers: All the Engineers of the Environments group
  • Engineering: Engineers and the Engineering Manager
  • EM: Engineering Manager
  • PM: Product Manager
  • FE: Frontend Engineer
  • BE: Backend Engineer
  • UX/PD: User Experience Designer
  • TW: Technical Writer

Meetings

Environments Team Meeting

We have one team meeting each week.The purpose of this meeting is to share information about the ongoing projects. It also contains general announcements that are important for collaboration.

Meeting format:

  • Before and during the meeting, team members write anything they want to verbalize in the notes attached to the meeting.
  • We wait 1 or 2 minutes for the team members who want to join and then start recording when ready.
  • During recording, we go over each point in the document.
  • Anyone can facilitate the discussion. If the EM or PM is there, they will kick things off.
  • If the author of the current point being discussed is available, they can verbalize their point.
  • A short discussion may occur around each point, taking into consideration that we want to get through as much of the document as we can.
  • All team members are welcome and encouraged to help take notes in the document while the discussion takes place.
  • After we get through all the points, we stop recording.
  • If there is left over time, team members may use the remaining unrecorded time to socialize or leave the meeting early.

If the meeting for the week has already taken place and you would like to add a new item for discussion, create a new section for the next meeting date above the last one and add your item.

Frontend, Go and Ruby Meetings

These are optional meetings on the team calendar. Everyone on the team is welcome. They are prioritized to be at a time where as many of the engineers who work primarily on the corresponding topics can attend.

These meetings are not too formal and also provide time for the engineers across time zones to discuss ongoing projects, ask questions, pair up, and catch up. We go through any agenda items first.

Meeting Links:

Technical Discovery Meetings

Sometimes we will encounter issues that need the input of the whole team to be refined and then worked on, such issues will be selected as a topic for a Technical Discovery meeting. We try to be conscious of sync time and so we expect a maximum of two of these meetings for each milestone. A technical discovery meeting consists of:

  • One Meeting that everyone has a fair opportunity to join.
  • The meeting is recorded.
  • The meeting is announced at least one week before it will be held and each participant must familiarize themselves with the issue that is being investigated prior to attending.
  • Discussing the topic async in advance in the issue/epic is encouraged.
  • The PM will open the conversation either by describing the use case/scenario or by recording a quick video about it.
  • The meeting is agenda first and everyone is expected to write their comments and questions in the agenda.
  • If the agenda is empty the meeting is cancelled.
  • The conversations in the meeting must be recorded in the same documents as notes.
  • One host is decided for every meeting, and they are responsible to drive the conversation forward.
  • In the last meeting someone is appointed to summarize the conversation either in the original issue or a technical document.

The goal of technical discovery meeting is to come up with a concrete technical proposal for the question at hand. We should not force a proposal, but aim to get there and write the conclusion accordingly with potential follow-ups.

Milestone Checkup Meeting

Twice a milestone, on Tuesday we hold a milestone check up meeting, where we either check the status of the work in progress issues or plan the next milestone.

Team issue tracker

Issue refinement

During the Team Sync Meeting the PM will bring 2 to 3 issues that need a refinement DRI. During the meeting a DRI will be chosen who will be responsible for refining the issue. Team members are encouraged to refine issues that they create / stumble upon at any time during their work if they have the bandwidth to do so.

The refinement process is described in the issue template.

Labels used

Discovery backlog

In discovery we use the following labels

  1. ~workflow::problem validation (optional) - to signal loosely defined problems where either the user problem or the business value is not yet understood
  2. ~workflow::ready-for-design - this is our (likely endless) design backlog
  3. ~workflow::solution validation (optional) - used for work with a concrete solution proposal that needs user validation
  4. ~workflow::design - for ongoing design work

Delivery backlog

  1. ~workflow::refinement - this is our delivery backlog; it contains all the issues that were not discussed by engineers in depth yet; this has no WIP limits
  2. ~workflow::scheduling - this is our backlog of already discussed issues; these issues are still waiting to be scheduled or even to be put on the roadmap; issues entering should have a preliminary weight; this has no WIP limits
  3. ~workflow::planning breakdown - this is the backlog for the upcoming milestone;  it has a WIP limit of 2-months capacity
  4. ~workflow::ready-for-development - this is the list of issues refined for the current or the upcoming milestone; it has a WIP limit of 2-months capacity; issues here should have a final weight

The PM is responsible for moving proposed issues from ~workflow::scheduling to ~workflow::planning breakdown and to maintain the WIP limits. Everyone is welcome to recommend issues for planning.

The EM is responsible for moving accepted issues from ~workflow::planning breakdown to ~workflow::ready-for-development and to maintain the WIP limits. Everyone is welcome to improve issues to make the ready for development.

Special labels

  • the ~environments::parked label is used to signal that we don’t intend to focus on an issue in the next 9-12 months

Milestone Board

The issues scheduled for a milestone can be tracked at Milestone Board.

This board contains all the necessary columns to track the workflow of the team, in particular:

  • Labels of interest as outlined above
  • One or more Milestone columns containing the planned work for the given milestone.

All the columns are prioritised top to bottom.

Once a team memeber self-assigns an issue on the Milestone Board, issue labels should follow the Engineering Workflow.

For Merge Requests, it’s up to the author and the project they are contributing to, to decide if they want to use these ~workflow:: labels. It is not required to use them or keep them synced up with the Issue labels.

Planning

Team Domain Limitations

The Environments team size is currently too smal to fully support the entire scope of our feature categories, to signal our priorities and do meaningful work we maintain a list of Feature Categories where we only do Critical Maintenance:

  • Auto DevOps
  • Feature Flags
  • Continuous Delivery
  • Infrastructure as Code
  • Release Orchestration

With Critical Maintenance we mean that we will be able to take on only p2/s2 and above Security, Scalability and Availability issues and only p1/s1 bugs otherwise classified, or issues considered impactful to fix by the Product Manager.

Issues falling out of the mentioned types will be marked with Environments::No-Capacity label and we will ignore their SLO. While we do not have the capacity to work on them we welcome and will support any community contributions to those issues.

Issue Weighting

The weights we use are:

Weight Extra investigation Surprises Collaboration
1: Trivial not expected not expected not required
2: Small possible possible possible
3: Medium likely likely likely
5: Large guaranteed guaranteed guaranteed

The above table is contextual. For example, domain knowledge, experience levels, and time at GitLab can impact an engineer’s perspective on whether an issue requires Extra Investigation or Surprises.

Weights are not set in stone. We do our best to get it right during refinement, but we want to be transparent and accurate. If an issue is taking more effort than is reflected in the existing weight, the DRI on the issue is encouraged to change the weight. We want accurate documentation of the level of effort that was required.

By giving a weight 1 to an issue, we’re saying “we can’t benefit from this issue being broken down into smaller units of shippable work.”

Anything 5 or larger should be broken down, these should not be ready for development. We would likely turn a 5 into an epic, into a research and implementation issue or a technical discovery.

Occasionally, a proof-of-concept (POC) is necessary to determine a feasible technical path. When one is required, the engineer will create a POC issue that contains the context of the research to be conducted along with the goals of the POC. This issue will be scheduled for work before any further breakdown of tasks is performed. Once the technical path is clear, the engineer can proceed to weight the issue and/or break down the issue further to guide implementation. Every POC issues should contain a list of questions we want to answer, the definition of done should include the answers and suggested next steps.

Not all POCs will be successful, and that is OK! Some avenues of research may not be successful, and the POC will have saved us from investing significant time in a solution that will not meet our needs. The goal is early feedback and fast iteration.

As a note, designers use the design weight labels instead of using the weight input within the issue, which is reserved for engineering.

Weight, Velocity, and Planning

We intentionally leave the term “velocity” undefined and do not use it in planning workload capacity for the team.

We leave the question of interpreting summed weights open to each unique situation.

When making decisions about how much work the team can take on for a milestone, we trust individual impressions and instincts reflected in the discussions that take place in the planning issue and the refinement process. The weighting system helps foster these discussions.

GitLab Terraform Provider

The GitLab Terraform Provider is managed by the Environments group.

Feature development

Our goal is to move towards a continuous delivery model so the team completes tasks regularly, and keeps working off of a prioritized backlog of issues. We default to team members self-scheduling their work:

  • Team members self-assign issues from the Milestone Board that are in the workflow:ready for development column and has the current milestone.
  • ~Deliverable issues take priority over any other work, as they are the main focus of each milestone and inform our say-do ratio.
  • Once a team member has completed their assigned issues, they are expected to go to the Milestone Board and assign themselves to the next unassigned issue from the current milestone.
  • If there are no more issues in the current milestone, engineers are expected to assign themselves to the next unassigned workflow:ready for development issue.
  • The issues on the board are in priority order based on importance (the higher they are on the list, the higher the priority). This order is set by the product manager.
  • If all issues are assigned for the milestone, team members are expected to identify the next available issue to work on based on the team’s work prioritization (see below).
  • While backstage work is important, in the absence of specific prioritization, the team will have a bias towards working on bug or feature categorized issues.

~Environments::EngineeringChoice process

While diligently pursuing our objectives, we also recognize the significance of work that resonates personally with our engineers. To facilitate this, we have introduced the “~Environments::EngineeringChoice” label. Here’s how it works:

  1. Selection During Milestones: In each Milestone Plan, engineers are encouraged to select up to five issues (total for the group) that they find particularly interesting or valuable, marking them with the “~Environments::EngineeringChoice” label. These issues should improve GitLab product or developer experience but they don’t have to be in the ~“group::environments” domain.
  2. Limit per Milestone: To maintain focus, no more than five issues should be labeled with “~Environments::EngineeringChoice” within a single milestone.
  3. Priority After Deliverables: Once all mandatory ~Deliverables are completed, the next priority is to address issues labeled “~Environments::EngineeringChoice.”
  4. Refined issues only: Before applying the “~Environments::EngineeringChoice” label, the issue should be worfklow::ready for development and accordingly needs a weight.
  5. Maximum issue size: To rule out likely surprises and extra investigation, only issues with weight 1-2 are acceptable for “~Environments::EngineeringChoice”.
  6. Tracking in Milestone Planning: Progress and choices under the “~Environments::EngineeringChoice” category will be monitored and recorded in a specific section of the Milestone Planning issue.

Bug fixing and prioritized work

In every milestone plan, we compile a list of bugs due in the coming milestone based on the severity SLA.

When severity labels are assigned/changed on a ~type::bug issue, we aim to set/adjust the issue due date at the same time. everyone is encouraged to set the deadline based on the date of the last severity label update and the SLA for the given severity.

Best practices for managing bug issues

Goals:

  • Effectively track and label bug related issues.
  • Ensure bug Due Dates are not missed due to a lack of DRIs on sub issues.
  • Ensure the team is aware that help is needed in a specific area on a bug that already has an overall DRI.

Context:

  • Single part bug issues

    • Some bugs only require a single cohesive effort to resolve. For example an isolated backend fix that requires no database or frontend changes. In these cases, the DRI of the bug issue is the person doing the work, and all work is tracked in the bug report issue.
  • Multi-part bug issues

    • In other cases, a bug issue may result in work across frontend, backend, and database. This can result multiple engineers working separately as DRIs of individual issues that all contributing to solving the bug. Multiple issues are needed.

Problem:

  • Without a clear structure of issues for multi-part bugs, it’s difficult for the team to know how to help and how to plan. This difficulty can negatively impact our say-do ratio.

Best practices for managing multi-part bug issues:

  • The original bug issue should be promoted to an epic.
  • The original DRI becomes the overall bug epic DRI (note this on the epic).
  • New sub issues representing each part of the work should be created on the epic.
  • The new issues should be noted as blocking the epic.
  • Except for severity and priority, Labels should be copied over.
  • Due dates should keep in mind the due date of the epic, which is based on severity and priority.
  • Deliverable labels should be applied if the epic is deliverable.
  • DRI can use the Milestone Planning issue and/or reach out to relevant team members to ask if there’s availability within the Due Date. cc your engineering manager so they can give a high level thumbs up/thumbs down regarding the change in priority.
Bug resolution process

The entire bug resolution process includes the following phases in order:

  1. GitLab Issue triage procedure: we have a handbook section we can follow here

  2. Environment’s team refinement process

  3. Planning

  4. Reprioritazation. The EM will change unplanned p3 bugs that have had no activity to p4 and remove the due date.

Putting the process together
  • Bug must be refined a milestone before it is due. This is done by the refinement DRI.
  • Bug fix must be planned for a milestone that ends before the bug’s due date. This takes place on the milestone planning issue.
  • Reprioritazation will have a dedicated section in the milestone planning issue.
  • Outdated bugs are closed in accordance with the existing handbook practice
Best practices
  • Read the issue triage handbook page
  • Ask the reporter for detailed steps to reproduce the problem, including minimal setup and expected versus actual outputs.
  • Request relevant documentation to validate the unexpected behavior, or an explanation if no documentation exists.
  • If the reporter is a GitLab team member, inquire if there are any insights on the impact of the issue, such as the number of users affected or specific features involved, to help prioritize the resolution.
  • Partner with the PM if you think it may not actually be a bug.

Say-do ratio

Our team keeps track of their commitment with say-do ratios, two metrics are important: say-do and reprioritized say-do

  • Say-do only applies to ~Deliverable issues.
  • By the 17th of the month the ~Deliverable label is applied to the upcoming milestone issues by the EM.
  • We aim roughly to assign one ~Deliverable for each engineer, this may change milestone by milestone.
  • Any issue that has the ~Deliverable label at that point is considered as promised to be delivered and is part of our say-do ratio.
  • If at any time during the milestone a ~Deliverable label is removed or the issue is removed from the milestone that issue does not count anymore in the reprioritized say-do metric, but still does count for say-do.

We aim to achieve 100% re-prioritized say-do and at least 80% say-do.

Example
  • In the milestone 15.11 we have 10 ~Deliverable issues labeled as such by the 17th of March 2023
  • Along the way we realise that 5 of those ~Deliverable issues will not make it, and reasonably before the end of the milestone, we move them to 16.0
  • At the end of the milestone there is an hiccup and of those 5 remaining issues, 1 is not completed.

Our say-do ratio would be 40% (4 out of 10) Our reprioritized say-do would be 80% (4 out of 5)

MR reviews

Team members should use their best judgment to determine whether to assign the first review of an MR based on the DangerBot’s suggestion or to someone else on the team. Some factors in making this decision may be:

  • If there is known domain expert for the area of code, prefer assigning the initial review to them.
  • Does the MR require a lot of context to understand what it is doing? Will it take a reviewer outside the team a long time to ramp up on that context?
  • Does the MR require a lot of dev environments setup that team members outside the Environments group are likely to not have?
  • Is the MR part of a larger effort for which a specific team member already has all the context?

As team members and domain experts, both the MR author and initial reviewer are encouraged to share the broader context before, during, and throughout the review process to assist maintainers in conducting efficient reviews. This context may cover:

  • Known limitations;
  • Edge cases;
  • Implementation reasoning;
  • Links to relevant references.

Providing context helps streamline the review process and invites a broader pool of maintainers to our domain (example).

Handling Deferred UX

Team members should make their best effort to resolve UX issues as they come up during MR reviews. However, there are times where the changes requested or feedback given would significantly slow down velocity. For the sake of efficiency and iteration, a Deferred UX issue must be opened to follow up on the feedback.

In these instances, the engineer who authored the original MR should assign themselves the issue and become the DRI to evaluate the UX feedback. This may mean reaching out to the team’s Product Designer to ensure the feedback is actionable and resolving the debt is prioritized appropriately during the following milestone planning. For example, for Deferred UX issues opened in the 16.3 milestone, engineers should evaluate and ensure appropriate prioritization of the issue during the planning of the 16.4 milestone. This does not mean that the issue must be resolved during the 16.4 milestone, but that the issue is placed into the appropriate step of our product development flow, or closed if appropriate.

This helps to ensure that Deferred UX issues are resolved in a timely manner, keeping with the overall goals of the group and adherence to broader engineering workflows.

Epic Ownership

The Environments group uses epics to describe features or capabilities that will increase the maturity of the Environments categories over time.

Each epic should be owned by an engineer who is responsible for all technical aspects of that epic. The engineering DRI will work closely with the Product Manager and Product Designer to understand the requirements and create issues that encapsulate the technical work required during the design/solution validation phases and build track of the Product Development Flow. Each issue needs to be weighted and contain enough information in the description area for any other engineer on the team to be able to pick up that work.

For the duration of building the epic, the engineer does not need to be the only person implementing the issues. They should keep watch of the work that is done on the issues so that they can verify that the work is progressing correctly. If there are problems with the work, or lengthy delays, they need to make sure the Product Manager and Engineering Manager are aware.

When work is nearing completion, the engineer should make sure that any additional issues that may have come up during the build process are either addressed, or scheduled for work. Additional issues should be created and added to the epic. This will help to make sure that we do not build up technical debt while building.

Finally, they should also monitor any work that needs to occur while rolling out the Epic in production. If there are rake tasks, database migrations, or other tasks that need to be run, they need to see those through to being run on the production systems with the help of the Site Reliability counterpart.

This places a lot of responsibility with the DRI, but the PM and EM are always there to support them. This ownership removes bottlenecks and situations where only the PM or EM is able to advance an idea. In addition, the best people to decide on how to implement an issue are often the people who will actually perform the work.

To declare an ownership, insert DRI: <your-gitlab-handle> at the top of the epic description. Example.

Quality Processes

Maintaining a high standard of quality is a critical factor to delivering winning products.

Within the Environments group we use the following processes and best practices to ensure high quality.

  1. We ensure each MR is accompanied with meaningful unit tests and integration tests.
  2. For each major feature we develop and maintain End to End tests that run nightly and confirm no regressions have been introduced to critical paths.
  3. On a weekly basis, we review our Triage report for bugs and regressions and take the appropriate action.
  4. We review the quality dashboard each milestone to track our long term progress at improving quality.

End to End Testing

The Environments group uses GitLab QA for End-to-End testing. We have guidelines for how our team is leveraging these tests.

gitlab-agent QA bot

In feed_alerts_configure we have a bot that runs tests at this project

If this bot alerts of a failed pipeline, we should treat these the same as a broken master branch.

  • Check the pipeline for intermittent errors (and retry if this is the case)
  • Otherwise create an investigation issue to dig further/fix.

Error Budget

Our target availability is 99.9%

Error Budget failure DRI process

Each week we receive an Error Budget report in #cd-section on Slack if we are under our target availability.

An engineer might be assigned as a DRI to look into this.

The DRI is neither expected to determine a root cause nor propose a solution on their own.

The DRI should instead reach out to the Scalability:Projections team for support.

Async Issue Updates

In order to optimize async collaboration across a big team we use issue updates to share progress completed on a specific issue or epic. Weekly updates on progress and status will be added to each issue by its assignee. A weekly update may be skipped if there was no progress. It’s preferable to update the issue rather than the related merge requests, as those do not provide a view of the overall progress. This applies to issues with the labels workflow::in dev or workflow::in review

The status comment should include what percentage complete the work is, the confidence of the person that their estimate is correct and, a brief note on what was done. It’s perfectly acceptable to have multiple updates if more than one DRI is working on the issue.

As a part of the async update it’s important to verify that the issue and related MRs workflow labels are correctly set.

Example

## Async status update

- Complete: 80%
- Confidence: 90%
- Notes: expecting to go into review tomorrow

To simplify the work of adding and keeping track of async updates TalTal can be used.

Career Development and Promotions

We want every team member to be advancing in their Career Development.

We follow the Engineering Department Career Development Framework.

Maximize asynchronous performance in this team

Async practices are particularly important to us because we live in time zones that do not afford much, if any, overlap during our working hours.

To maximize our asynchronous performance, we should follow the GitLab Communication guideline. More specifically, the following points are important:

  • Have an SSOT discussion page (Issue or MR). This is the main collaboration point that everyone can get the latest information quickly. The description section should contain essential and up-to-date information, such as:
    • What’s the problem to solve?
    • Who’s the DRI in charge of making the decision?
    • What’s the acceptance criteria (e.g. user experience goal)?
    • Is anything out of scope?
    • What proposals do we have?
    • What are the PROs/CONs and technical difficulty of each proposal?
    • Whose approval do you need for making the decision?
    • When is the due date to make the decision?
    • FAQ
  • The DRI keeps the description updated with latest information based on any decisions made in threads.
  • When a team member is asked to give input, they should respond as soon as possible to unblock discussions. It’s also fine to respond that you don’t have any feedback or can’t take time for it, so that the DRI can avoid waiting for your response.
  • If the DRI didn’t get much progress from the asynchronous communication, the DRI should schedule a synchronous meeting or reach out to broader audiences.
  • When the DRI schedules a sync meeting, they should make sure that agendas are prepared before the meeting starts.

Monthly Showcases DRI

We participate in the OPS showcase initiative, to facilitate the selection of topics, the creation of the issues and content we have a Showcase DRI which will:

  • Ensure every month at least a showcase issue is created and linked in the right issues/epic
  • Facilitate the selection of the topic of each showcase, paying attention to give space to everyone in the team
  • Help whoever is creating the content with video creation and issue description
  • Ultimately is the showcase DRI responsibility that a showcase issue is produced and ready in time

Currently the showcase DRI for FY24Q3 is: @anna_vovchenko

How to work with us

Default to GitLab Issues

Why

We think that using GitLab Issues as much as possible is the best way to align with our values of Transparency and Efficiency. Using Issues gives us the greatest chance of collaboration, reusing any work done, and documenting the request and outcome in a findable, persisted way.

How

Follow the guidance in our request for help documentation.

How to contribute to Auto DevOps

Read our specific GDK instructions as well as our handbook entry on what existing testing does and how to develop features for Auto DevOps.

Shared Cloud Infrastructure

The Environments group has access to a shared GCP project which can be used for demos, experiments, or to host auxilliary services. The project id is deploy-stage-shared-i-e55e01cb and was created and provisioned using the following ARs:

If you need to create permanent infrastructure in that GCP project, it’s encouraged to do it with Terraform to easily share and document the setup with the entire group. You can use this GitLab group to host the project.

If the infrastructure is temporary, you can manage it with whichever tools you prefer.

Example/Demonstration projects

When you need to create an example project for demonstartion, consider having it in the example group instead of your personal namespace.

This allows us to collect all of the knowledge under the same place. Also, this example group has EEP license by default.


Auto DevOps
Auto DevOps is a technology that allows automated application of DevOps best practices.
Environments Group - GitLab Quality Assurance End-to-End Testing for the Environments group

Overview

The goal of this page is to document how the Environments group uses the GitLab QA framework (video walkthrough) to implement and run end-to-end tests.

Supporting slides for the above walkthrough

Why do we have them

End-to-end testing is a strategy used to check whether your application works as expected across the entire software stack and architecture, including the integration of all micro-services and components that are supposed to work together.

Last modified October 29, 2024: Fix broken links (455376ee)