MLOps Jobs to Be Done

What are the problems users want MLOps to solve?

NOTE: The MLOps Incubation Engineering project has become the MLOps team. These pages are left fir historical purposes, but are not actively maintained. Please refer to the MLOps team page for updated information.

What is this

To better contextualize our efforts on MLOps Incubation Engineering, we are defining a list of Jobs To Be Done - the objectives our users are trying to accomplish in MLOps. We will anchor these JTBD to the stage they belong to, and each MLOps exploration will try to address one or more of these JTBD.

This is a living issue, and new JTBD will continue to be added based on our understanding of MLOps and user Feedback.

Why is MLOps different from traditional DevOps?

While MLOps goals is the exact same as DevOps, the JTBD are a result of the difference between software the includes ML (sometimes referred as software 2.0) and traditional software. For traditional software, logic is made explicit through code, while on for ML the logic is implicit in the data, and extracted through a variety of technique’s. By relying on the data, it becomes a first class citizen for Ops (quality of the underlying data will impact directly on the quality of the software output), and not being explicit brings different types of vulnerability and uncertainty that needs to be addressed. Note that MLOps is not a branch of DevOps, it’s a superset of it.

MLOps JTBD

When working with software that includes Machine Learning, I want be able to iterate on the models with confidence, so that I can deliver the most value to my users.

MLOps JTBD per stage

Plan

Code Job Description
IE_MLOPS_PL_1 When starting a project, I want to make a decision on when using Machine Learning becomes ROI positive, so that I don’t optimize early on
IE_MLOPS_PL_2 When starting working with ML, I want find knowledge from previous analysis done by colleagues, so that I can build upon them

Create

Code Job Description
IE_MLOPS_CR_1 When working on a Machine learning model, before release, I want to share and discuss the code with colleagues, so that I catch bugs and bad assumptions
IE_MLOPS_CR_2 When creating a machine learning model, I want to encode the training as a pipeline, so that I can minimize human error
IE_MLOPS_CR_3 When optimising a model, I want to compare the outcomes of pontential hyperparemeter variations, so that I can choose the best candidate
IE_MLOPS_CR_4 When creating a Machine Learning model, I want automatic creation of models from input data and desired outcome, so that I can focus on modelling the data and the business case
IE_MLOPS_CR_5 When creating a Machine Learning model, I need access to production data, so that my models are accurate
IE_MLOPS_CR_6 When creating a Machine Learning model, I want to access to synthetic data, so that I don’t void users privacy
IE_MLOPS_CR_7 When training a model, I want to run part of my code in one or more remote machines, so that I can finish training faster
IE_MLOPS_CR_8 When training a new model, I want to explore existing models so that I can build upon them
IE_MLOPS_CR_9 When creating a machine learning model, I want to explore and process the data available, so that I can decide how to approach the problem
IE_MLOPS_CR_10 When creating a machine learning model, I need to label my dataset either manually or crowdsourced, to define the target of the machine learning model

Verify

Code Job Description
IE_MLOPS_VR_1 When the underlying data changes, I want to be informed so that I can adapt the model
IE_MLOPS_VR_2 When code or data for the model changes, I want to rerun hyperparameter tuning so that I can chose the best model
IE_MLOPS_VR_3 When testing ML code, I want to run the training DAG as specified during create, so that I don’t duplicate work
IE_MLOPS_VR_4 When working with ML, I want to schedule model trainings based on time or triggers, so that the model doesn’t suffer performance degradation
IE_MLOPS_VR_5 When working with ML, I want minimize how often each step of the DAG is run, so that costs are reduced
IE_MLOPS_VR_6 When a model is deployed, I want to verify whether the labeling was done correctly, and is still valid, so that I am sure the targets were correct

Packaging

Code Job Description
IE_MLOPS_PG_1 When working in a software with a Machine Learning, I want to package models in a common format, so that I avoid changing code on every release
IE_MLOPS_PG_2 When multiple versions of the models exist, I want to trace the code, data and configuration that was used to create each of them, so that I can reproduce the model
IE_MLOPS_PG_3 When iterating on a model, I want to search and browse through past versions, so that I can deploy a past version
IE_MLOPS_PG_4 When iterating on a model, I want to see the performance of past models, so that I can communicate progress over time

Release

Code Job Description
IE_MLOPS_RL_1 When deploying an ML model, I want to deploy packaged ones, so that I minimize human itervention
IE_MLOPS_RL_2 When deploying an ML model, I want to be able to deploy multiple versions simultaneously, so that I can compare their behavior
IE_MLOPS_RL_3 When a model is deployed, it should have access to the data in the same format as during training, so that they don’t behave differently
IE_MLOPS_RL_4 When a model is deployed, I my data transformation pipelines to be orchestrated, so that the model has the right data available
IE_MLOPS_RL_5 For deployed models, I want to batch predictions, so that I can avoid heavy compitation during runtime
IE_MLOPS_RL_6 For deployed models, I want to make predictions on streaming data, so that my model reacts quickly to new data
IE_MLOPS_RL_7 When deploying a machine learning model, I need to compare prior and current version on live data, so that I can publish the best version

Secure

Code Job Description
IE_MLOPS_SC_1 When using ML models as libraries (eg fine tuning transformers), I want to be warned whether the parameters were tampered with, so that I don’t introduce unwanted behaviour

Configure

Code Job Description
IE_MLOPS_CF_1 When working in a software with a Machine Learning, I want to control resources for each model accordingly, so that I provide the best user experience while keeping costs low
IE_MLOPS_CF_2 When training or running a machine learning model, I want to specify the hardware configurations needed, so that I don’t need to file a ticket for new resources
IE_MLOPS_CF_3 When training or running a machine learning model, I want to not depend on a specific cloud provider, to avoid vendor lockin

Monitor

Code Job Description
IE_MLOPS_MN_1 When input data configuration changes, I want to be warned, so that I can retrain my machine learning model
IE_MLOPS_MN_2 When the performance of a Machine Learning model degrades, I want to be informed, so that I can retrain my machine learning model
IE_MLOPS_MN_3 When a deployed machine learning model is taking too much resources, I want to be informed, so that I don’t incur into too much cost
IE_MLOPS_MN_4 After deploying a model, I want to run adhoc analysis on the prediction data, to explore the outcomes and potential improvements
IE_MLOPS_MN_5 After deploying a model, I want to run adhoc analysis on the prediction data, so that I can answer questions from the business
IE_MLOPS_MN_6 When deploying a model, I want to have model performance divided by cohorts of users, so that I can detect potential biases
IE_MLOPS_MN_7 After deploying a model, I want to label predictions following the initial labeling process, so that I can verify if the model works as expected

Resources

Last modified August 16, 2024: Replace aliases with redirects (af33af46)