Authentication Group

Govern:Authentication

Mission

Our mission is to empower GitLab system administrators with the toolkit they need to create their desired balance of security and accessibility for their GitLab experience. Authentication is the first impression any new customer has when they configure their shiny new GitLab instance, and we aim to make it as seamless as possible: from that moment of first logging in, to onboarding users, to managing the basic security rules for their instance in a secure, flexible and scalable manner.

Top Priorities for FY25

Our detailed priority list can be found at the direction page however on a higher level the focus would be on:

  1. GCP integration
  2. Cells readiness
  3. Service accounts MVC features
  4. Passwordless authentication
  5. Enterprise user admin controls and policies management
  6. Bringing Credentials inventory to GitLab.com
  7. Reducing the number of flakey tests, older FFs, S3 bugs and manual handing of support/CSM questions.
  8. Group SCIM Sync support
  9. Service accounts UI and enhanced capabilities
  10. Token management enhancements
  11. Credential Manager enhancements

Customer Outcomes we are driving for Gitlab

As a result of the above roadmap items, we aim to driver the following outcomes for our customers:

  • By supporting Cells work, customers should experience improved reliability and compartmentalization of disruptions on GitLab.com
  • Expanding Service accounts will reduce the human touch points around credentials setup for automation use cases, bolstering customer’s security posture and efficiency when using GitLab. At the same time, new Service account UI will allow them to easily setup, manage and revoke these higher privilege accounts providing better transparency and audit-ability. Combined with token management enhancements, customers will be able to confidently manage, enforce and mitigate access token related risks.
  • Additional Enterprise user admin controls will result in reduced workload and improved security policy management for organizations while migration of Credential Inventory to SaaS will provide all administrators better visibility and control around credentials in use.
  • We aim to improve and unify user provisioning setup for customers by expanding SCIM group sync support such that they don’t need to rely on both SAML and SCIM for user provisioning and access management.

List of OKRs

Our OKRs are focused on:

Metrics the team tracks

In order to ensure we manage our technical debt while making progress on our roadmap, and addressing security vulernabilities or bugs in time, the team tracks the following metrics:

  • Engineering productivity metrics

    • MR rates across team to ensure changes are being delivered iteratively
    • S1, S2 and S3 bugs and burndown rates to ensure that we are moving towards a downward trend on all 3. Currently no S1 are open, with a very small number of S2s.
    • Vulnerability due dates, in particular any that run the risk of hitting SLOs
    • Error rates for Authentication API, workers and web workflows
    • Infradev issues SLOs
    • Maintainer-ship status for the group.
    • Domain knowledge areas and coverage across each one for bus factor tracking.
    • Feature flags associated with the group
    • Flakey test attributed to the group
  • Product metrics

    • Adoption of auth integrations such as SAML, Identity providers, Group sync
    • Enterprise users claimed

What makes us different?

The Authentication group is a central piece to the GitLab product! While many groups focus on single area - like the repository view, or the merge request view, this group has a much broader impact on many areas. Because of this, there are some key topics that we keep on our mind more than other groups might:

Security vulnerability issues

Bypassing permissions and authentication mechanisms are, by nature, common security targets. We closely watch new security vulnerability issues and schedule them as quickly as possible based on their due date. For planning security issues we use the (filtered) milestone planning issue board. We expect all team members to be able to resolve security issues following the security release process.

Code Review

Because this group works on components of the application that have a far-reaching impact, we take these extra steps in order to reduce our risk of a production incident:

  1. Our team’s merge requests should be assigned to another Auth team member for first review in order to build more institutional knowledge across the team. This review should be done as a reviewer. The Auth approval counts as the approval matching the role of the Auth Reviewer, e.g. having a Backend Review from Auth counts as a Backend Review. Once approved, the Auth Reviewer should request a review from a Maintainer from the appropriate maintainer category.
  2. Auth merge requests will include a comment that needs answered before merging, “Should this be behind a feature flag?” This is an effort to remind engineers about feature flag usage, but also to challenge reasoning as to why changes do not need to be behind a feature flag.
  3. Auth related merge requests require a review by an Auth Engineer. This is guarded by using the CODEOWNERS feature of GitLab.

How we work

  • In accordance with our GitLab values.
  • Transparently: nearly everything is public, we record/livestream meetings whenever possible.
  • We get a chance to work on the things we want to work on.
  • Everyone can contribute; no silos.
    • The goal is to have product give engineering and design the opportunity to be involved with direction and issue definition from the very beginning.
  • We do an optional, asynchronous daily stand-up in our group stand-up channel:

With our counterparts

You are encouraged to work as closely as needed with stable counterparts. We include quality engineering and application security counterparts prior to a release kickoff and as-needed during code reviews or issue concerns.

Quality engineering is included in our workflow via the Quad Planning Process and is responsible for bug prioritization during release planning. They are also the DRI when it comes to adding new end-to-end tests, though anyone can contribute. Here are some examples of when to engage with your counterpart:

Application Security will be involved in our security issue workflow and should participate in other feature, bug, or maintenance issues before they are scheduled if we need to notate any concerns or potential risks that we should be aware of. Here are some examples of engaging with your counterpart:

Keeping yourself informed

Planning

We plan in monthly cycles in accordance with our Product Development Timeline. Our typical planning cycle includes:

  • At the beginning of each month, Product should have created a planning issue for the coming release.
    • This issue should include the general themes of a release. It should also include placeholders for the quad to put their prioritized feature, bug, and maintenance lists.
    • Issues without estimates should be investigated by engineering throughout the month so weighting does not need to happen at once. Once the prioritized feature list is determined for a milestone, engineering will put a weight on any issues on the list that do not have a weight.
  • Between the first 12 days the month we estimate the issues but also bring up any key issues we as a team believe need to be included in the upcoming milestone (bug/maintenance/features). Over time this allows us to build up a pool of issues for upcoming miletones we mutually agree are important.
  • By the 12th day, all planned issues proposed for the next release should be estimated by engineering and put into workflow:ready for development.
    • The EM should add any issues that have the potential to slip. Issues that we are sure will slip should be automatically rescheduled to the upcoming release in advance.
    • After estimation, Product should make any needed adjustments to the proposed scope based on estimates. At this stage, any issues that the team is committing to delivering should have the ~Deliverable label applied to them by the engineering manager. Critical items such as ~bug::vulnerability or others that should be worked on first to allow for a greater chance of completion can be found in the ~Pick-first column of our workflow dashboard such as this one

Some items may be marked for a certain milestone, but do not have the ~Deliverable label. This means that they have not been committed to for a milestone and should be moved to the backlog or reconsidered for a future release. The Product Manager facilitates this process.

Estimation process

Before work can begin on an issue, we should estimate it first after a preliminary investigation. If the scope of work of a given issue touches several disciplines (docs, design, frontend, backend, etc.) and involves significant complexity across them, consider creating separate issues for each discipline (see an example).

After the PM creates a planning issue for the next release, the quad will gather a prioritized list of issues that should be considered. The EM will then create a “breakdown” issue with a checklist of issues that need to be estimated (example). All Engineers from the group should be assigned to that issue and participate in weighting or breaking down the list.

When estimating development work, please add the appropriate weight to the issue:

Weight Description (Engineering)
1 The simplest possible change. We are confident there will be no side effects.
2 A simple change (minimal code changes), where we understand all of the requirements.
3 A simple change, but the code footprint is bigger (e.g. lots of different files, or tests affected). The requirements are clear.
5 A more complex change that will impact multiple areas of the codebase, there may also be some refactoring involved. Requirements are understood but you feel there are likely to be some gaps along the way. We should challenge ourselves to break this issue into smaller pieces.
8 A complex change, that will involve much of the codebase or will require lots of input from others to determine the requirements. These issues will often need further investigation or discovery before being ~workflow::ready for development and we will likely benefit from multiple, smaller issues.

We do not provide estimates greater than 8. If an issue is bigger, we will split the issue or reduce its scope.

If the issue has weight 5 or less, the Engineer adding an estimate should also put the ~"workflow::ready for development" label on it.

If the issue has weight more than 5 (or 5 but it seems it might be split into multiple issues) the Engineer will suggest how to split it. If the Engineer is clear about the splitting they should proactively split the issue themself, put estimations on the child issues and leave a comment on the parent one. If it results to multiple issues, creating an epic should be considered.

When an Engineer is done with their estimates they should ask another team member to review their weight. This makes it harder to miss a significant aspect of an issue. If the first Engineer is confident (eg. has deep expertise in the problem or the issue is trivial) they can skip this step.

Purpose of Estimation

Our purpose of estimation is to confirm that we

  • clearly understand the goal of the issue
  • understand what a solution could be, but not necessarily all of the implementation details
  • assess whether blockers exist or not

We do not expect that during estimation, implementation details are scrutinized, all blockers are caught, or prove that the proposed solution is the right one. This all happens during development. In development, we, sometimes, reshape the issue, change its solution, or even postpone it. We are proud of doing it.

Considerations for Estimation

With keeping the above in mind, some areas to consider when estimating are:

  • If this work needs to be broken down into smaller tasks? Including adding a spike for that investigation/break down.
  • Impact on unit, feature and QA testing. Sometimes an apparently small modification in the code can lead to many changes in the tests.
  • Is there a hidden inter-dependency between the frontend and backend that needs to be agreed upon?
  • Are there data persistence requirements e.g in browser, redis or database?
  • How would this feature/bug-fix be deployed and is there complexity in managing the the rollout?
  • Does this work require collaboration across teams? e.g security

During a release

Engineers should assign issues to themselves based on the current release and ~Deliverable label. We use the following workflow label cycle on our issues:

  1. workflow::ready for development - work is ready to begin, but has not yet started
  2. workflow::in dev - we are looking at the issue, and are either actively investigating, have begun development, or have a draft merge request open
  3. workflow::in review - a merge request has been submitted and reviews have been requested
  4. workflow::verification - the work has merged, and needs to be verified
  5. workflow::complete - the work has been verified by the personed assigned to verify; verifier will close the issue and apply label
  6. workflow::awaiting security release - (security MRs only) the work is complete, just pending backports.

Development should not begin on an issue before it’s been estimated and given a weight.

Issues with the workflow::verification or workflow::awaiting security release labels are, for the purposes of release planning, the same as closed issues, because the dev work is complete.

For work items that span greater than 1 week or are high priority deliverables (such as critical infradev issues or bugs), the assignee for the work aims to provide a brief weekly update on the progress of tasks within the issue being worked on. This is intended to be 2-3 lines highlighting what was accomplished recently, and the next steps. We do this to share the progress transparently with our counterparts but also to share domain knowledge and course correct on implementation details if uncovered after work has started.

Verification

The issue verification step is optional starting starting 16.4 and will be required in 16.6. It should be done by someone else other than the MR author. Verification decreases the risk of defects getting into production and a different perspective to cover more test cases.

  • All MRs should have verification steps in the description. In the case where multiple MRs are created for an issue, the engineer who is assigned to the issue should add complete verification steps in the issue description or as a reply to the triage bot’s comment.
  • When an engineer has merged their work, they should move their issue into the verification status, indicated by the ~workflow:verification label and wait until they receive notification that their work has been deployed on staging via the release issue email.
  • Issues in the ~workflow:verification state are assigned randomly by the triage bot based on the verification policy to an applicable team engineer. This engineer should then additionally verify the issue.
Labels and how we use them

We have many labels that can be applied to an issue or merge request. Besides the issue workflow labels above, here are the minimum basic labels to apply to issues and merge requests:

  • Type (type::feature, type::bug, or type::maintenance)
  • Stage that owns the area (devops::govern)
  • Group that owns the area (group::authentication)
  • Specialization (frontend, backend, database, documentation)
  • security if the issue is related to application security, and breaking change if this work is considered a breaking change

Working on unscheduled issues

With the combination of our capacity planning (EM) and estimation (IC) processes above, engineers should have free time to work on ~“Stuff that should just work” or other topics that interest them. If an unscheduled issue should really be prioritized, bring it up in a planning issue or ask your manager to reduce your capacity further.

How we prioritize for a release

We have cross-functional prioritization aligned with our prioritization framework. The engineering manager will prioritize type::maintenance issues, the product manager will prioritize type::feature issues, and the software engineer in test will prioritize type::bug issues. From there, we are able to select a ratio of the top issues to be planned for the release by using our cross-functional issue board. Starting 16.5, our target ratio is to plan 60% features, 20% bugs, and 20% maintenance per release. Security issues do not count towards these ratios, but instead take away from the total capacity. The data below helps us understand our overall cross-functional status.

Roadmap planning and t-shirt sizing

At the start of the fiscal year, product and engineering counterparts collaborate over areas we want to focus on. This allows us to prepare issues that will be added to priorities.yml ahead of each milestone, gauge whether we are overcommitting to the year, and transparently share what the delivery goals are for the upcoming year. As part of this exercise, engineering also performs a high level t-shirt sizing to assist with planning. It’s important to highlight that the sizing is high level estimation, based on expected complexity and not the actual length of time taken to complete the work similar to how milestone estimation is done. An example of the scales used would be:

Average story points completed per milestone: 48w. This includes bugs, features, maintenance, and any security vulnerability work.

T-shirt sizing scale:

  • Small → < 5w
  • Medium → < 8w
  • Large → < 16 w
  • X-large → 32 w

The roadmap items are then marked with the Small, Medium labels and a priority to ensure, higher priority work is scheduled and appropriately resourced.

Monthly cross-functional dashboard review

We create a monthly issue that is assigned to all counterparts. The issue has to be created manually by an Engineering Manager, but we have an issue template in the Authentication discussion repo.

You can read more about this process on the handbook page.


Although we try to plan as described above, we understand that there is always a risk to the ideal ratios due to unexpected high priority security, infradev, and escalation issues.

Meetings

Although we have a bias for asynchronous communication, synchronous meetings are necessary and should adhere to our communication guidelines. Some regular meetings for Authentication are:

Frequency Meeting DRI Attendees Topics
Weekly Group-level meeting (internal only) Product Manager PM/EM/UX/SET, optionally engineers Discuss current or upcoming roadmap topics, bring epics/issues up for attention, provide a current status of priority initiatives

For one-off, topic specific meetings, please always consider recording these calls and sharing them (or taking notes in a publicly available document).

Agenda documents and recordings can be placed in the shared Google drive (internal only) as a single source of truth.

All meetings and 1-1’s should have an agenda prepared in advance. If this is not the case, you are not obligated to attend the meeting. Confirm if the meeting should be canceled if there is not an agenda by the start time of the meeting.

Group Members

Auth group members who are part of the Authentication group can be @ mentioned on GitLab with @gitlab-org/govern/authentication.

The following people are permanent members of the group:

Name Role
Adil FarrukhAdil Farrukh Engineering Manager, Govern:Authentication
Andrew EvansAndrew Evans Senior Backend Engineer, Govern:Authentication
Bogdan DenkovychBogdan Denkovych Backend Engineer, Govern:Authentication
Drew BlessingDrew Blessing Senior Backend Engineer, Govern:Authentication
Eduardo Sanz-GarciaEduardo Sanz-Garcia Senior Frontend Engineer, Govern:Authentication
Hannah SutorHannah Sutor Principal Product Manager, Govern:Authentication and Authorization
Imre FarkasImre Farkas Staff Backend Engineer, Govern:Authentication
Smriti GargSmriti Garg Senior Backend Engineer, Govern:Authentication
Aboobacker MKAboobacker MK Senior Backend Engineer, Govern:Authentication

Dashboards