ML model deployment can seem like an onerous process, especially for teams with limited engineering resources. We’ve spoken to hundreds of data science teams to lay out the 9 key questions you need to ask when you’re ready to deploy your ML model into production. And yes, all 9 are super important.
Here’s what you need to be able to answer before you’re ready to choose the right hosting solution for your model
1. How will you package the model?
Good model deployments are repeatable model deployments. To start, you’ll need to collect all of the dependencies of your model so that they can be installed the same way, every time. If you don’t, you’ll end up in a situation where the model works on your machine, but doesn’t on the model hosting server.
Most teams do this with Docker, but it’s not always easy. You’ll need to convert your notebook into a Python script, collect all the package dependencies into a requirements.txt, and build a Dockerfile that copies and installs these (and any other dependencies) into the version of Python and Linux that’s best suited for your ML model.
You’ll also need some way for the model to be called from the Docker container, which often means creating a Flask app to handle inputs and outputs from the model.
2. Where will the model be hosted?
Now that you have a Docker image with your model and all its dependencies, it’s time to host it! Typically data science teams cannot deploy using the same infrastructure as the product engineering teams, and that’s a good thing. Product engineering teams tend to move a lot slower than data science teams, and that makes releasing and iterating on new versions of models much harder and slower.
Depending on the characteristics of your ML model you may need a large server with lots of RAM and maybe GPUs, or perhaps you can use something small and serverless. Setting up your own hosting infrastructure also means figuring out inbound and outbound network connectivity, DNS settings, and various settings and permissions for building and pulling the docker image.
3. Who will maintain the hosting environments?
The hosting environment will need to be monitored and occasionally upgraded. Product engineering teams do not typically use Python or have time for managing extra infrastructure, so you’ll need to have your own experts that can manage and maintain your model’s Python environments.
Yes, environments, plural! As the famous expression goes, “two is one; one is none.” Servers crash, and so you need at least two servers so that your model doesn’t have an outage if one of the servers goes down. Of course, if one of your servers goes down, you’ll need to be alerted so you can fix it, or bring up a new one.
More than one person on your team needs to share responsibility with managing the environment. Don’t let the “one is none” rule lead to an extended outage because the one person who knows how to recover the model hosting server is on vacation with Slack very intentionally placed on silent. We all deserve a break - except for your ML model that you’ve deployed into production.
4. What happens when the model has a problem?
Just like servers, models can crash. Sometimes the process running the model will run out of RAM and get killed by the operating system. Or perhaps the model throws a particularly bad exception and crashes the Python process with a segfault. In any event, you’ll need monitoring for these events, and recovery logic to keep the model serving through these problematic events.
5. Where will the model’s logs get sent and stored?
At minimum you’ll want logs of what inputs were used to call the model, how long the model took to respond, and what the result was. As your use cases get more advanced, you’ll also want logging around which version of the model was used, and other related data points.
These logs need to get sent from your hosting environment to someplace where you can search and analyze them. Frequently logs are used for alerting, so your monitoring infrastructure is likely going to need to be paired with your logging infrastructure.
6. How will new versions of the model get deployed?
You’ll want to deploy new versions of your model after retraining or tweaking some hyperparameters that led to a quality boost. To minimize the time it takes, and the risk of errors from manual steps, you’ll need to build automation that packages and deploys your model with as little human input as possible. A good, but complicated, template to follow here is the CI/CD processes followed by engineering teams.
7. How will model versions get rolled back to recover from a deployment problem?
Inevitably, if you deploy enough versions of your ML model, one of them will be bad and you’ll need to roll it back. Make sure you have a plan for rolling back bad versions of models, and test it on a regular basis so you’ll know it works in an emergency.
8. What happens when the parameters sent to the model change? What about the response?
As you deploy more versions of your model you’ll make changes to improve its performance. Perhaps the newest version of the model expects more parameters, or returns multiple quality scores instead of one.
To make releasing changes like these easier, you’ll want the ability to host multiple versions of the same model at the same time, at different URLs. This way you can keep the old one running, and switch to the new one once your product is ready to send (and receive) new data.
9. Who controls access to the model?
Last but never least, security. You’ll need API keys to control who is allowed to call the model, and logs recording who made changes to the model. Finally, you’ll need user permissions limiting who is allowed to make changes to the model. These permissions should be integrated into a user management system, so that it’s easy to onboard and offboard members of your team without risking the security of your deployed model.
ML Model Deployment with Modelbit
At the beginning of this article we mentioned that we spoke to hundreds of data science teams about the challenge and key steps to ML model deployment. That’s because we’ve been building Modelbit to make it incredibly easy for data science teams to deploy their ML models into production by simply calling modelbit.deploy() in their data notebook. Modelbit handles everything you need to host your ML model from packaging and inference to versioning and security. Once you deploy your ML model with Modelbit you can call it from your product via REST API.
Try it for free today or let us know if you have any questions.