I recently had the opportunity to speak at the Analytics Institute Ireland about a topic I’m passionate about and I thought I’d share my presentation in a blog post.
If you’d like to just go straight to the slides you can view them here.
MLOps interest over time
MLOps is a practice that has seen a recent surge in popularity in our industry as analytics professionals. To give its full definition MLOps refers to Machine Learning Operations. If you are familiar with this practice this post is probably not something you should continue to spend your time reading, but if you’ve never heard of it before hopefully by the end you’ll have an idea of what challenges it attempts to overcome.
I think the best way for me to explain some of the problems we see with our customers is to share a story. This is where Bob comes into our world.
This is Bob
Bob’s a data scientist — for those who know me or can view my picture on medium / LinkedIn you will see that I & Bob share similar hairstyles 🙂
Bob has recently joined ACME Ltd as part of a small but rapidly growing team of data scientists. ACME Ltd is a telecoms business that has identified the potential of machine learning. ACME Ltd has so far really only done what I see as the first step in taking advantage of the data they are sitting on. They began like most companies by using data to make decisions. For example, creating Business Intelligence dashboards to inform decision-makers, understand their customers, and began defining a data strategy.
ACME Ltd is now ready to start using machine learning, they want to move to step 2 — where an organization begins to incorporate AI/ML into their product/service in order to differentiate themselves. So far they’ve only used machine learning in a development/lab environment, somewhere safe, not customer facing but Bob’s about to change that.
Bob sits down in his manager’s office and is given his first project. He is to predict using AI/ML when a customer of ACME Ltd is likely to leave (or to give its proper business definition “churn”) and then display that likelihood into the CRM application that the sales reps use. They can do their magic in sales to try and stop that person from leaving (offering discounts and upgrades etc).
What most people think an ML project looks like
So Bob sets off on his adventure and begins to follow this process above – this simple machine learning model lifecycle. This lifecycle is a process that ACME Ltd has been following for a number of years with their BI projects, and with their experimental ML projects. So it must be something that could work for getting Bob’s project done, right?
So he’s gotten a definition of the business goal — reduce churn in ACME Ltd’s customers. He then sets off to access and understand the data — so he talks to a data engineer to get the relevant data he needs, cleans it, removes missing values, and makes it ready for machine learning. Bob then gets onto the fun part, building his model.
Bob builds his model and once he is happy with the results he showcases it to his manager and the sales manager of the business. He shows an example of 5000 customers and the likelihood of them churning over the next 12 months. The sales manager’s eyes glistened with joy and asks “How long till we can get this integrated into our CRM Bob?” Bob’s never put an ML model into production before, he also never integrated a model into a traditional application like a CRM. He ponders this question for a moment and replies back “2 months”.
Let’s have a quick summary of the events so far:
- Bob’s gotten his business goal.
- He’s gotten access to the data and has cleaned it.
- He’s built a model that works as expected.
This process has taken Bob around 3 months so far, he estimates that the remaining steps which involve deploying the model to production and monitoring that model will take an additional 2 months.
In reality, it takes Bob an additional 9 months, 9 months of talking to the CRM developers — to ensure the model can integrate properly with the application. 9 months of talking to operations people — who will host and look after this model once Bob is done & 9 months talking to security people — who want to understand what security controls are in place to keep this model secure and more importantly the data it processes secure.
He finally gets it into production and integrated it into the CRM a year since he started. He turns around to the sales manager and goes “Here you go”. But the business has changed at ACME Ltd, they’re moving markets, and the sales manager doesn’t care if they lose customers — they’re after new different ones anyway. Bob has wasted a year of his life on a project that doesn’t make a difference, he’s in theory wasted the time of the developers, operations, and security people. He leaves ACME Ltd, not having a process in place to deal with this model that is now in production.
Bob’s story is similar to many stories we’ve heard from our customers, they just can’t get their models into production in a quick, safe manner. So this is where MLOps comes in as this practice helps address these challenges.
Let’s define MLOps
Can you get models into production without MLOps?
So you might be thinking to yourself — Jamie we don’t really do MLOps but we’re getting our machine learning into production just fine. So can you do AI/ML in production without MLOps? Of course, you can.
But we think as you begin to scale your investment into this area you are going have multiple models in production, multiple different data scientists working on different models and it will quickly become a huge mess that you are probably going have to clean up. MLOps should be viewed as a factory approach to machine learning. It removes this artesian approach that so many companies take and puts processes, tooling, and practices into the approach.
What an actual machine learning project will look like in production
What you’ll find is the process looks a bit like this. It involves multiple steps, multiple different people from different teams, and multiple times it can all go horribly wrong. Imagine having to scale this up manually for every project, imagine having to do this without any automation. This is where MLOps tooling comes into play, we need to automate as much of this process as possible because if we don’t get the below problems.
One issue I want to focus on is scaling. It is fine just putting one model into production successfully — but once the business sees the success they’ll want more models pushed into production.
High-level implementation of MLOps
So Jamie great, we understand this thing called MLOps how do we go implement it? Well, sadly it’s not one of those well-drawn-out maps. It’s not something you can buy, you can obviously buy platforms & tooling that enables you to do it but fundamentally it comes down to people. It comes down to the way people operate within the organization.
Fundamentally MLOps is creating processes and flows and in reality, the flow should try to match how your business communicates. One item I wish to highlight is making it easy for data scientists to experiment is incredibly important. Data Scientists are scientists… they need to be able to hypothesize and test and find out new things in an environment that is safe and not production-facing. On the flip side of that, there needs to be that tooling and processes in place that can quickly take something as an experiment and release it into production safely.
How this might look from a systems perspective is showcased above. One particular platform that we offer that can help you create a system like the above is Red Hat OpenShift for Data Science or Open Data Hub. I’ve written about deploying Open Data Hub here. Likewise, if you ever want to talk about MLOps or ask a question I might have not covered here please feel free to reach out to me on LinkedIn.
Additional reading resources
- Introducing MLOps — O’Reilly Book
- MLOps: What It Is, Why It Matters, and How to Implement It — Neptune AI
- MLOps — Guides around implementation & governance