MLOps Jobs to Be Done

What are the problems users want MLOps to solve?

What is this

To better contextualize our efforts on MLOps Incubation Engineering, we are defining a list of Jobs To Be Done - the objectives our users are trying to accomplish in MLOps. We will anchor these JTBD to the stage they belong to, and each MLOps exploration will try to address one or more of these JTBD.

This is a living issue, and new JTBD will continue to be added based on our understanding of MLOps and user Feedback.

Why is MLOps different from traditional DevOps?

While MLOps goals is the exact same as DevOps, the JTBD are a result of the difference between software the includes ML (sometimes referred as software 2.0) and traditional software. For traditional software, logic is made explicit through code, while on for ML the logic is implicit in the data, and extracted through a variety of technique’s. By relying on the data, it becomes a first class citizen for Ops (quality of the underlying data will impact directly on the quality of the software output), and not being explicit brings different types of vulnerability and uncertainty that needs to be addressed. Note that MLOps is not a branch of DevOps, it’s a superset of it.

MLOps JTBD

When working with software that includes Machine Learning, I want be able to iterate on the models with confidence, so that I can deliver the most value to my users.

MLOps JTBD per stage

Plan

Code	Job Description
IE_MLOPS_PL_1	When starting a project, I want to make a decision on when using Machine Learning becomes ROI positive, so that I don’t optimize early on
IE_MLOPS_PL_2	When starting working with ML, I want find knowledge from previous analysis done by colleagues, so that I can build upon them

Create

Code	Job Description
IE_MLOPS_CR_1	When working on a Machine learning model, before release, I want to share and discuss the code with colleagues, so that I catch bugs and bad assumptions
IE_MLOPS_CR_2	When creating a machine learning model, I want to encode the training as a pipeline, so that I can minimize human error
IE_MLOPS_CR_3	When optimising a model, I want to compare the outcomes of pontential hyperparemeter variations, so that I can choose the best candidate
IE_MLOPS_CR_4	When creating a Machine Learning model, I want automatic creation of models from input data and desired outcome, so that I can focus on modelling the data and the business case
IE_MLOPS_CR_5	When creating a Machine Learning model, I need access to production data, so that my models are accurate
IE_MLOPS_CR_6	When creating a Machine Learning model, I want to access to synthetic data, so that I don’t void users privacy
IE_MLOPS_CR_7	When training a model, I want to run part of my code in one or more remote machines, so that I can finish training faster
IE_MLOPS_CR_8	When training a new model, I want to explore existing models so that I can build upon them
IE_MLOPS_CR_9	When creating a machine learning model, I want to explore and process the data available, so that I can decide how to approach the problem
IE_MLOPS_CR_10	When creating a machine learning model, I need to label my dataset either manually or crowdsourced, to define the target of the machine learning model

Verify

Code	Job Description
IE_MLOPS_VR_1	When the underlying data changes, I want to be informed so that I can adapt the model
IE_MLOPS_VR_2	When code or data for the model changes, I want to rerun hyperparameter tuning so that I can chose the best model
IE_MLOPS_VR_3	When testing ML code, I want to run the training DAG as specified during create, so that I don’t duplicate work
IE_MLOPS_VR_4	When working with ML, I want to schedule model trainings based on time or triggers, so that the model doesn’t suffer performance degradation
IE_MLOPS_VR_5	When working with ML, I want minimize how often each step of the DAG is run, so that costs are reduced
IE_MLOPS_VR_6	When a model is deployed, I want to verify whether the labeling was done correctly, and is still valid, so that I am sure the targets were correct

Packaging

Code	Job Description
IE_MLOPS_PG_1	When working in a software with a Machine Learning, I want to package models in a common format, so that I avoid changing code on every release
IE_MLOPS_PG_2	When multiple versions of the models exist, I want to trace the code, data and configuration that was used to create each of them, so that I can reproduce the model
IE_MLOPS_PG_3	When iterating on a model, I want to search and browse through past versions, so that I can deploy a past version
IE_MLOPS_PG_4	When iterating on a model, I want to see the performance of past models, so that I can communicate progress over time

Release

Code	Job Description
IE_MLOPS_RL_1	When deploying an ML model, I want to deploy packaged ones, so that I minimize human itervention
IE_MLOPS_RL_2	When deploying an ML model, I want to be able to deploy multiple versions simultaneously, so that I can compare their behavior
IE_MLOPS_RL_3	When a model is deployed, it should have access to the data in the same format as during training, so that they don’t behave differently
IE_MLOPS_RL_4	When a model is deployed, I my data transformation pipelines to be orchestrated, so that the model has the right data available
IE_MLOPS_RL_5	For deployed models, I want to batch predictions, so that I can avoid heavy compitation during runtime
IE_MLOPS_RL_6	For deployed models, I want to make predictions on streaming data, so that my model reacts quickly to new data
IE_MLOPS_RL_7	When deploying a machine learning model, I need to compare prior and current version on live data, so that I can publish the best version

Secure

Code	Job Description
IE_MLOPS_SC_1	When using ML models as libraries (eg fine tuning transformers), I want to be warned whether the parameters were tampered with, so that I don’t introduce unwanted behaviour

Configure

Code	Job Description
IE_MLOPS_CF_1	When working in a software with a Machine Learning, I want to control resources for each model accordingly, so that I provide the best user experience while keeping costs low
IE_MLOPS_CF_2	When training or running a machine learning model, I want to specify the hardware configurations needed, so that I don’t need to file a ticket for new resources
IE_MLOPS_CF_3	When training or running a machine learning model, I want to not depend on a specific cloud provider, to avoid vendor lockin

Monitor

Code	Job Description
IE_MLOPS_MN_1	When input data configuration changes, I want to be warned, so that I can retrain my machine learning model
IE_MLOPS_MN_2	When the performance of a Machine Learning model degrades, I want to be informed, so that I can retrain my machine learning model
IE_MLOPS_MN_3	When a deployed machine learning model is taking too much resources, I want to be informed, so that I don’t incur into too much cost
IE_MLOPS_MN_4	After deploying a model, I want to run adhoc analysis on the prediction data, to explore the outcomes and potential improvements
IE_MLOPS_MN_5	After deploying a model, I want to run adhoc analysis on the prediction data, so that I can answer questions from the business
IE_MLOPS_MN_6	When deploying a model, I want to have model performance divided by cohorts of users, so that I can detect potential biases
IE_MLOPS_MN_7	After deploying a model, I want to label predictions following the initial labeling process, so that I can verify if the model works as expected

Resources

Last modified June 10, 2024: Add vale rules and fix errors (61f97a04)

View page source - Edit this page - please contribute.