Infrastructure as Code for Machine Learning

By
Harry Glaser, Co-Founder & CEO
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Infrastructure as Code for Machine Learning

How we deploy software

I’m old enough to remember boxed software. In fact, I’m old enough to remember buying Nvidia chips in boxes to run my boxed software. In those days, deployment was such a laborious process that we only did it every three years or so. After a couple years of building features and fixing bugs, the release would approach. Everyone would shift into QA and validation mode, fixing the last bugs and making sure everything was working. Finally, the engineering lead would burn the gold master: the blueprint for the install CDs that would be delivered to manufacturing and eventually to stores.

The internet changed all that of course. The first major change was that the “gold master” was no longer a CD but a binary delivered over the wire. In the early days of the internet, instead of sending the release versions to the printing presses, we’d upload them to the servers. Our customers could use their license keys to login and download the latest releases. They’d install them into their workstations and datacenters at their leisure, just like they used to do from CDs. 

Cloud computing changed this paradigm again. Now the customer didn’t have to install the software at all. They’d just visit our website, which would be running the latest version of the software automatically on the server. The release process was similar: Code for a while, build a version of the software, and then ship the new version to the servers. My last startup, founded over ten years ago now, worked this way. An engineer who was pushing a new release would hit the gong, so that we would all know a new version was going out. We’d watch the servers restart with the new version, crossing our fingers for no issues, and cheer when the first new users would get the new version.

Of course this paradigm shift, like paradigm shifts before it, was a major improvement. But it didn’t take full advantage of the scale that cloud computing offers us. Because pushing was still a manual process, it was error-prone. Because it was error-prone, we were careful about it, and didn’t do it that often. It also limited our scale: The number of servers, and the different kinds of servers, was limited by what we could reasonably manage as a small team. As the team got bigger, we hired a whole team of DevOps folks to manage the complexity, risks and challenges.

Thankfully, modern teams don’t deploy this way any more. Thanks to tooling like GitHub and Terraform, we can specify exactly the server configuration we need in configuration files in the source code itself. This brings innumerable benefits: The configuration can specify much larger, more complex infrastructure than humans could ever manage manually. This means the software can scale in much more interesting ways. We can test the server configurations directly. This makes them much more bulletproof. We can version control them, knowing which server changes were made in conjunction with which code changes. This makes it much easier to troubleshoot and debug. All of this allows us to move much faster with confidence, shipping much more often. The end result is much more scalable, higher-quality software that moves much faster.

Deploying Machine Learning To Production

Machine Learning models, especially the larger and larger neural nets that have revolutionized so much in the last few years, add two critical pieces of complexity to production deployments. 

The first is that the models themselves are pieces of data that are very large: Tens or hundreds of gigabytes. This makes them poor fits for Git, the version control system which usually manages all the source files and configuration files that participate in the Infrastructure-as-Code system. It also makes them hard for modern Infrastructure-as-Code systems to manage in general, since they’re usually many-minute uploads. 

The second is that each ML model must be run in the exact software – and, in some cases, hardware – environment in which it was trained. Specific versions of specific Python packages are required, and in some cases specific GPUs. Often specifically-defined clusters of machines coordinating in concert are required. This adds new dimensions of complexity to the Infrastructure-as-Code systems, which are used to making simplifying assumptions like, say, that every machine in the cluster can run the same container image.

For this reason, Machine Learning is stuck outside the Infrastructure-as-Code movement looking in. If you’re an ML Engineer, it’s still 2003. We’re still copying big data files around, updating servers by hand, and crossing our fingers that everything will work in concert when the users get the new version.

No longer.

Infrastructure as Code, Meet Machine Learning

The pace of innovation on Machine Learning technology has been breathtaking. As a ML company ourselves, it’s been inspiring to have a front-row seat to the innovative competition in LLMs, the new Computer Vision models seemingly every week, the multi-modal models combining both technologies, and more. Even more inspiring is seeing what our customers do with these models, from bespoke medical studies that help us live longer to automatic school shooter detection that literally saves the lives of our kids.

Unfortunately, the limitations teams face when deploying these models – the fact that they’re stuck in 2003 – is, in our view, unacceptably limiting what should be a major time of improvement and change based on these models. The uphill battle teams face in deploying new versions, or testing new technologies, is limited due to the backward-looking nature of the tools available. 

That’s why we built Modelbit. To bring Infrastructure as Code to Machine Learning teams. To give them the same scalability, reliability and velocity that their peers in Cloud Software enjoy. To let them spend more time building models, and less time uploading models to servers and manually rebooting them.

It’s a simple, straightforward idea that required a couple technical leaps. 

First, we had to make Git work for ML models. This means embracing and extending the technology that is the lingua franca for version control and CI/CD in industry today. Our infrastructure comes with a package that extends Git by automatically uploading any large model files to a high-performance model store that is optimized for serving. 

This is totally transparent to our users: git push automatically handles this, and git pull automatically pulls the files back down for editing. They are kept out of the repo itself, to avoid bogging it down, but still versioned and branched exactly as our customers expect and require.

Second, we serve our web application directly from the git repo. The web app lets customers manage all their running models, view the logs, set up alerting, run A/B tests and more. All of these configurations are stored in the repo itself. If a user clicks a button to change from one GPU to another, that’s a git commit. If they change the weighting of an A/B test, that’s a git commit. All of this requires a high-performance backend to the web server that looks like an ORM to the web server, but looks like a Git client to the repo. 

From there, there’s lots of feature functionality work: Per-model configuration files that control what infrastructure gets booted. Full containerization and isolation of the model deployments themselves. And we haven’t even begun to talk about high-performance inference serving. 

The key thing to understand, though, is this: Modern software innovation was unlocked by the Infrastructure-as-Code movement that lets small teams of software engineers orchestrate huge clusters of servers. The same unlock is coming to Machine Learning teams. We’re proud to be the team to deliver it.

Deploy Custom ML Models to Production with Modelbit

Join other world class machine learning teams deploying customized machine learning models to REST Endpoints.
Get Started for Free