MLOps Jobs to Be Done
What are the problems users want MLOps to solve?
NOTE: The MLOps Incubation Engineering project has become the MLOps team. These pages are left fir historical purposes, but are not actively maintained. Please refer to the MLOps team page for updated information.
What is this
To better contextualize our efforts on MLOps Incubation Engineering, we are defining a list of Jobs To Be Done - the objectives our users are trying to accomplish in MLOps. We will anchor these JTBD to the stage they belong to, and each MLOps exploration will try to address one or more of these JTBD.
This is a living issue, and new JTBD will continue to be added based on our understanding of MLOps and user Feedback.
Why is MLOps different from traditional DevOps?
While MLOps goals is the exact same as DevOps, the JTBD are a result of the difference between software the includes ML (sometimes referred as software 2.0) and traditional software. For traditional software, logic is made explicit through code, while on for ML the logic is implicit in the data, and extracted through a variety of technique’s. By relying on the data, it becomes a first class citizen for Ops (quality of the underlying data will impact directly on the quality of the software output), and not being explicit brings different types of vulnerability and uncertainty that needs to be addressed. Note that MLOps is not a branch of DevOps, it’s a superset of it.
MLOps JTBD
When working with software that includes Machine Learning, I want be able to iterate on the models with confidence, so that I can deliver the most value to my users.
MLOps JTBD per stage
Plan
Code |
Job Description |
IE_MLOPS_PL_1 |
When starting a project, I want to make a decision on when using Machine Learning becomes ROI positive, so that I don’t optimize early on |
IE_MLOPS_PL_2 |
When starting working with ML, I want find knowledge from previous analysis done by colleagues, so that I can build upon them |
Create
Code |
Job Description |
IE_MLOPS_CR_1 |
When working on a Machine learning model, before release, I want to share and discuss the code with colleagues, so that I catch bugs and bad assumptions |
IE_MLOPS_CR_2 |
When creating a machine learning model, I want to encode the training as a pipeline, so that I can minimize human error |
IE_MLOPS_CR_3 |
When optimising a model, I want to compare the outcomes of pontential hyperparemeter variations, so that I can choose the best candidate |
IE_MLOPS_CR_4 |
When creating a Machine Learning model, I want automatic creation of models from input data and desired outcome, so that I can focus on modelling the data and the business case |
IE_MLOPS_CR_5 |
When creating a Machine Learning model, I need access to production data, so that my models are accurate |
IE_MLOPS_CR_6 |
When creating a Machine Learning model, I want to access to synthetic data, so that I don’t void users privacy |
IE_MLOPS_CR_7 |
When training a model, I want to run part of my code in one or more remote machines, so that I can finish training faster |
IE_MLOPS_CR_8 |
When training a new model, I want to explore existing models so that I can build upon them |
IE_MLOPS_CR_9 |
When creating a machine learning model, I want to explore and process the data available, so that I can decide how to approach the problem |
IE_MLOPS_CR_10 |
When creating a machine learning model, I need to label my dataset either manually or crowdsourced, to define the target of the machine learning model |
Verify
Code |
Job Description |
IE_MLOPS_VR_1 |
When the underlying data changes, I want to be informed so that I can adapt the model |
IE_MLOPS_VR_2 |
When code or data for the model changes, I want to rerun hyperparameter tuning so that I can chose the best model |
IE_MLOPS_VR_3 |
When testing ML code, I want to run the training DAG as specified during create, so that I don’t duplicate work |
IE_MLOPS_VR_4 |
When working with ML, I want to schedule model trainings based on time or triggers, so that the model doesn’t suffer performance degradation |
IE_MLOPS_VR_5 |
When working with ML, I want minimize how often each step of the DAG is run, so that costs are reduced |
IE_MLOPS_VR_6 |
When a model is deployed, I want to verify whether the labeling was done correctly, and is still valid, so that I am sure the targets were correct |
Packaging
Code |
Job Description |
IE_MLOPS_PG_1 |
When working in a software with a Machine Learning, I want to package models in a common format, so that I avoid changing code on every release |
IE_MLOPS_PG_2 |
When multiple versions of the models exist, I want to trace the code, data and configuration that was used to create each of them, so that I can reproduce the model |
IE_MLOPS_PG_3 |
When iterating on a model, I want to search and browse through past versions, so that I can deploy a past version |
IE_MLOPS_PG_4 |
When iterating on a model, I want to see the performance of past models, so that I can communicate progress over time |
Release
Code |
Job Description |
IE_MLOPS_RL_1 |
When deploying an ML model, I want to deploy packaged ones, so that I minimize human itervention |
IE_MLOPS_RL_2 |
When deploying an ML model, I want to be able to deploy multiple versions simultaneously, so that I can compare their behavior |
IE_MLOPS_RL_3 |
When a model is deployed, it should have access to the data in the same format as during training, so that they don’t behave differently |
IE_MLOPS_RL_4 |
When a model is deployed, I my data transformation pipelines to be orchestrated, so that the model has the right data available |
IE_MLOPS_RL_5 |
For deployed models, I want to batch predictions, so that I can avoid heavy compitation during runtime |
IE_MLOPS_RL_6 |
For deployed models, I want to make predictions on streaming data, so that my model reacts quickly to new data |
IE_MLOPS_RL_7 |
When deploying a machine learning model, I need to compare prior and current version on live data, so that I can publish the best version |
Secure
Code |
Job Description |
IE_MLOPS_SC_1 |
When using ML models as libraries (eg fine tuning transformers), I want to be warned whether the parameters were tampered with, so that I don’t introduce unwanted behaviour |
Code |
Job Description |
IE_MLOPS_CF_1 |
When working in a software with a Machine Learning, I want to control resources for each model accordingly, so that I provide the best user experience while keeping costs low |
IE_MLOPS_CF_2 |
When training or running a machine learning model, I want to specify the hardware configurations needed, so that I don’t need to file a ticket for new resources |
IE_MLOPS_CF_3 |
When training or running a machine learning model, I want to not depend on a specific cloud provider, to avoid vendor lockin |
Monitor
Code |
Job Description |
IE_MLOPS_MN_1 |
When input data configuration changes, I want to be warned, so that I can retrain my machine learning model |
IE_MLOPS_MN_2 |
When the performance of a Machine Learning model degrades, I want to be informed, so that I can retrain my machine learning model |
IE_MLOPS_MN_3 |
When a deployed machine learning model is taking too much resources, I want to be informed, so that I don’t incur into too much cost |
IE_MLOPS_MN_4 |
After deploying a model, I want to run adhoc analysis on the prediction data, to explore the outcomes and potential improvements |
IE_MLOPS_MN_5 |
After deploying a model, I want to run adhoc analysis on the prediction data, so that I can answer questions from the business |
IE_MLOPS_MN_6 |
When deploying a model, I want to have model performance divided by cohorts of users, so that I can detect potential biases |
IE_MLOPS_MN_7 |
After deploying a model, I want to label predictions following the initial labeling process, so that I can verify if the model works as expected |
Resources