Diving into the world of machine learning and AI, whether as an engineer, data scientist, or AI specialist, often means one thing: bringing your ML models to life through deployment to production environments.
Every week, there's a new "Eureka!" moment in AI, signaling a breakthrough. Yet, crafting these state-of-the-art models is only the starting point. The true measure of their worth unfolds when they're seamlessly integrated into production, addressing real-world problems and enhancing business solutions. Picking the right tools for deployment can make all the difference in ensuring the effectiveness of your models.
There are vital questions that arise when selecting your deployment strategy:
- Can the model scale to millions of users while maintaining peak performance?
- Can the model integrate seamlessly into existing workflows?
- Can my ML platform meet my model’s infrastructural demands? Carefully consider the adequate memory size, CPU, instance type, and GPU for your models.
- How cost-effective will it be to deploy your models?
Addressing these questions is essential before settling on a deployment strategy. Today, end-to-end ML platforms, like Modelbit and SageMaker, as well as open source solutions, like RayServe, BentoML and Seldon, are available to ease the deployment phase.
While several options for end-to-end ML model deployment platforms have come onto the scene over the last few years, Amazon SageMaker has long stood as the de facto choice for ML teams who want to consolidate on to one platform. Yet, you do not need to dive too far into the various ML communities on Slack or Reddit to learn that there isn’t exactly a universal love for SageMaker, and that an alternative to SageMaker is in demand.
In this in-depth comparison, we will dissect the capabilities, workflows, pricing structures, and real-world use cases of Amazon SageMaker and Modelbit. By the end of this article, you'll have the knowledge needed to make an informed decision when choosing between both tools for your machine learning models.
This comparison will also guide you through the necessary steps to deploy an ML model in both SageMaker, as well as its alternative, Modelbit.
Pain Points ML Engineers Face When Deploying ML Models Through Amazon SageMaker
Amazon SageMaker is marketed as a comprehensive machine learning platform offering a model-building, training, and deployment ecosystem. Amazon SageMaker runs on the AWS Cloud. It provides an ecosystem to build, train, and deploy machine learning models for any use case with fully managed infrastructure, tools, and workflows.
Machine learning and AI engineers will, however, face some unique challenges when deploying machine learning models to production environments with Amazon Sagemaker. In this section, you will learn about most ML engineers' limitations when using SageMaker through real-world examples, personal experiences, and user feedback.
Here are four pain points we hear users face when deploying models with SageMaker:
- Complexity: Unintuitive for users who want to deploy multiple types of ML models, each with their own specific requirements, to production.
- Vendor lock-in: Constrained within the AWS ecosystem.
- Cost limitations: Paying more than most non-SageMaker alternatives, especially for large model deployments.
- Rapid prototyping: Slow iteration and swift deployment of models to market.
Complexity
With SageMaker comes the burden of learning how to use different AWS services to deploy your model to production successfully:
- Amazon S3: To upload training datasets, store features, and save training output data for your hyperparameter tuning jobs.
- Amazon IAM: To set up roles, permissions, and access controls for SageMaker to communicate with other AWS services.
- Amazon API Gateway: To provide an external access point to Amazon SageMaker Inference endpoints.
- Sagemaker Feature Store: To extract features from data and manage them to train ML models.
- Amazon Cloudwatch: For monitoring the performance of SageMaker resources.
- Amazon VPC: To establish networking and communication between AWS services and external resources.
Amazon SageMaker also has several AI/ML components running under the hood for end-to-end machine learning workflows. We have heard users make complaints about the many SageMaker services that do not all play well together. You may find it cumbersome to iterate across the entire workflow using different components before they can deploy a model, especially if they only want to deploy one.
The web-serving framework SageMaker provides could be more intuitive to use. We have met several CTOs who said they had to hire front-end engineers to build and wrap a custom UI around SageMaker (as well as Databricks) so that their teams could use it.
Vendor Lock-in
As with most services powered by big public cloud providers, vendor lock-in within the AWS ecosystem is a crucial concern for our users regarding SageMaker. SageMaker API only works in that ecosystem and is tightly or moderately interoperable with other AWS services, depending on the service.
A key concern with vendor lock-in is that not all services may be well-developed enough to solve different aspects of your stack. So if you use a component and need the best service for your workflow, you might not be able to leverage external tools without some operational costs.
Cost Limitations
Large model deployments on SageMaker frequently escalate operational costs because they might require more resource-intensive instance types. While your SageMaker experience is fully managed, it abstracts many operational details. Features such as data processing, batch transform, notebook instances, training, and feature stores, come with their own cost. For end-to-end ML workflows, these costs add up quickly and significantly increase the operational costs of large model deployments on SageMaker.
AWS offers detailed pricing for each feature, but you are responsible for familiarizing yourself with the associated costs and actively monitoring expenses using tools like AWS Cost Explorer.
To avoid such “hidden costs,” carefully consider the resource requirements of your large model before deploying it on SageMaker. Also, consider SageMaker's built-in cost optimization features to help reduce your costs.
Rapid Prototyping
SageMaker's complexity often presents a challenge for many teams. This intricacy hinders the ability to prototype swiftly, forcing ML teams to adapt and reshape their workflow around the tool, rather than the tool enhancing their processes.
We have had discussions with users who find it challenging to move medium- to large-scale models through SageMaker components to quickly deploy, update, and ship new features without operational overhead. They cannot prototype with new model types rapidly because they have workflows configured only to support specific models.
In particular, they have had to write custom code and automations in order to make SageMaker work for them, and that code makes assumptions about model types and resource constraints. Those assumptions then get violated when the team wants to deploy new types of models. SageMaker can require such low-level configuration just to get working that the cost of changing its configurations to adapt to new model types becomes prohibitive.
Let’s look at SageMaker alternatives for shipping your models to production in the next section.
Alternatives to Amazon SageMaker Inference
Compared to Amazon SageMaker, deployment platforms are available for hosting your machine learning models as endpoints. In this section, you will learn some alternatives to AWS SageMaker Inference.
Here are other options:
- Modelbit.
- Ray Serve.
- Nvidia Triton Inference Server.
Modelbit
Modelbit simplifies deploying and managing machine learning models in production. It emphasizes usability and simplicity—quickly deploy models as REST APIs with an intuitive and user-friendly interface. This ease of use speeds up the deployment process to move ML models to market.
Modelbit also prioritizes monitoring and management, with features for keeping close tabs on the health and performance of the models you deploy. This is critical for maintaining model reliability in production and meeting service level agreements (SLAs).
In terms of pricing, you only pay for what you use. Modelbit has a monthly and an annual pricing model. Modelbit customers can also prepay for compute at a discounted rate.
Ray Serve
Ray is an open-source, all-encompassing computing framework for scaling various AI and Python workloads. It offers a seamless platform for extending the capabilities of AI and Python applications, covering a wide range of tasks, including reinforcement learning, deep learning, hyperparameter tuning, and model deployment.
Ray Serve is built on top of Ray. You deploy a machine learning model by defining a deployment decorator ( `@serve.deployment`) on a Python class containing the prediction logic and an application (consisting of one or more deployments that handle inbound traffic). It serves large language models and “traditional” deep learning models.
No direct pricing is associated with using the open-source Ray libraries, but AnyScale recently started providing Ray Serve as a service.
Nvidia Trition Inference Server
Triton Inference Server is a critical component of the NVIDIA AI platform. It offers a unified and standardized approach to deploying and executing AI models.
Triton Inference Server is free (but the compute is, of course, not) and open to the community for use and contribution. NVIDIA also provides the option to purchase NVIDIA AI Enterprise, which includes Triton Inference Server, along with a suite of enhanced features and support services to meet the specific needs of businesses looking for comprehensive AI solutions.
Amazon SageMaker vs. Modelbit: A Comparative Analysis
In this comparative analysis, we will explore the essential features between Modelbit and Amazon SageMaker to help you decide when choosing the right solution for your model deployment.
These features are the criteria for comparison:
- Deployment Options: What variety and flexibility of methods does the platform provide for deploying models to production environments?
- Budget: How cost-effective is the platform, and what is its overall financial impact for model deployment and management?
- Scalability: How well does the platform handle traffic bursts and growth, and can it be scaled without performance bottlenecks?
- Ease of Use: How simple and intuitive is the platform? Can users and teams quickly grasp and use features?
- Integration: How interoperable is the platform with other tools, systems, and technologies used within the organization?
- Continuous Deployment: Does the platform support automated processes that allow for iterative and rapid deployment of models to production?
- Security and Compliance: What measures and features does the platform have to ensure data protection, privacy, and adherence to compliance standards?
- Monitoring and Logging: What capabilities does the platform offer for tracking, analyzing, and logging the health of your deployment?
We decided to compare Amazon SageMaker and Modelbit’s deployment capabilities based on these features because we see them repeatedly come up in conversations with users and in broad discussions in communities that widely use both platforms.
Let’s compare! 👀
Deployment Options
Amazon SageMaker:
- SageMaker offers real-time inference endpoints for low-latency requirements, serverless endpoints for sporadic traffic bursts, asynchronous endpoints for long processing tasks, and batch transforms for dataset predictions.
- Deploy models from the SageMaker Console, AWS CLI, and AWS SDK (Boto3).
Modelbit:
- Enables deploying custom ML models to production environments using REST APIs that support online real-time inferences and large batch requests.
- Deploy models directly from your development and notebook environments (Jupyter, Hex, Deepnote, and VS Code) with Python and Git APIs.
Budget
Amazon SageMaker:
- The pricing model is pay-as-you-go.
- On-demand ML, storage, and data processing instances: See the full breakdown.
- Charges for unrelated deployment features like data processing, batch transform, notebook instances, training, and feature stores.
- Instances are usually costlier than running EC2 instances.
Modelbit:
- Pricing model is pay-as-you-go.
- Pricing per compute minute. See the full breakdown.
- Customers can also prepay for compute minutes at a discounted rate.
Scalability
Amazon SageMaker:
- SageMaker endpoints autoscale to production traffic.
Modelbit:
- Models deployed with Modelbit are deployed to fully isolated containers behind REST API endpoints that will autoscale to the model’s resource needs.
Ease of Use and Integration
Amazon SageMaker:
- Tightly or loosely interoperable with other AWS services and requires familiarity with AWS services. This may pose a learning curve for those new to the AWS ecosystem.
- The UI isn’t exactly intuitive. Using SageMaker might also require building a custom front-end around the entire system to make it usable.
Modelbit:
- Prioritizes simple usage by enabling fast ML model deployment for data scientists, especially through integrations with various Python environments (like Google Colab).
- Integrates with ML tools like neptune.ai, Weights & Biases, Arize AI, and Eppo.
- Integrates with business intelligence (BI) tools like Looker, Tableau, and Snowsight to monitor and visualize the model predictions.
Continuous Deployment
Amazon SageMaker:
- Offers features like SageMaker Pipelines and SageMaker Projects for managing ML pipelines and creating end-to-end ML solutions with CI/CD, respectively.
Modelbit:
- Streamlined with CI/CD tools you likely already use, like GitHub Actions, GitLab CI/CD, and Azure Pipeline.
- Automatically syncs with your git repo.
Security and Compliance
Amazon SageMaker:
- Secure HTTPS endpoints. You may have to set up AWS Resource Access Manager, use AWS PrivateLink, or route communication through API Gateway to coordinate access from third-party apps.
Modelbit:
- The single-tenant architecture and fully containerized deployments protect your code and data for the on-premises and cloud platforms.
Monitoring and Logging
Amazon SageMaker:
- Real-time monitoring: Uses Amazon CloudWatch for monitoring, which processes data into near real-time metrics and allows for real-time monitoring through CloudWatch Logs.
- Model drift detection: SageMaker Model Monitor provides monitoring and alerts to identify model quality deviations and detect drift.
- Data sync with warehouses: Integrates with various AWS data services, for example, Amazon Redshift, but might require intermediary services for interoperability.
Modelbit:
- Real-time monitoring: Provides real-time endpoint monitoring via Slack alerts. It integrates with DataDog, Arize, and other vendors for additional production monitoring.
- Model drift detection: Provides basic drift analysis out of the box and syncs logs to Snowflake, DataDog, and other log vendors for more sophisticated drift analysis.
- Data sync with warehouses: Synchronizes with Snowflake, Redshift, and Athena.
Phew! Now that you understand how Modelbit and SageMaker’s Inference options stack up, let’s put our concerns and comparisons into practice by comparing the workflows for deploying the same model.
Head over to the fun section 👇.
Practical Implementation: Deploying an ML model with SageMaker vs. Modelbit
It’s time to see both SageMaker and Modelbit in action! In this section, you will deploy an XGBoost model for a diabetes binary classification problem on the popular “Diabetes Dataset.” You will build and deploy the same model with Amazon SageMaker and Modelbit to practically compare the workflow for both platforms.
For the purpose of a balanced comparison, we will build and deploy the same model with the same hyperparameters, and data preprocessing code.
Let’s start with Amazon SageMaker.
Deploy a Model Using Amazon SageMaker
First step, let’s set up the data and development environment.
Create an S3 bucket to store your data:
Create an Amazon SageMaker notebook instance. To do this, access the AWS Management Console and search for "SageMaker." This action will allow you to create a SageMaker notebook environment for development and model deployment.
After successfully setting up your SageMaker notebook instance, the next step is to ensure that your IAM (Identity and Access Management) role has the necessary permissions to access data in the S3 bucket.
Navigate to the IAM section in the AWS Management Console. Locate and select the IAM role associated with your SageMaker instance. Attach the appropriate S3 permissions to grant your SageMaker notebook the required access to the contents stored in the designated S3 bucket. This access is vital for effectively handling and utilizing the data within your SageMaker environment.
Here, the notebook's filename is "SageMaker-Deployment." Once you create the notebook, import your dataset from S3.
One common approach is to download the file from your S3 storage into your notebook's working directory. Subsequently, you can utilize a library like Pandas to read and manipulate the dataset.
Find the complete code for this section in this Colab notebook.
Create your AWS SageMaker session and initialize the IAM execution role:
Amazon SageMaker provides a default S3 bucket to access using “SageMaker.Session().default_bucket()”. To streamline the process, use the following code block to upload the CSV files you downloaded locally in your Jupyter instance to this bucket.
This step is essential for making the data accessible within the SageMaker environment. With the data successfully uploaded to the default S3 bucket, run the training code in the Colab notebook.
Here’s the code to train your model (a SageMaker Estimator) and fine-tune the parameters of the XGBoost model:
To initialize training, fit the estimator on the training and validation splits:
The training process may take some time to complete, depending on the size of your data. Once it completes, you should see an output similar to the one below.
Once training is complete, deploy the model by calling `.deploy()` on the XGBoost SageMaker estimator you just fitted:
The code deploys your model on a single ml.m4.xlarge instance.
Perfect! You have successfully deployed your AWS SageMaker model as an endpoint. Confirm deployment by heading to the SageMaker console>>Inference>>Endpoints.
After creating the endpoint, you can test them using Amazon Sagemaker Studio, the AWS SDK, or the AWS CLI.
You would have to configure the endpoint to be accessible and test it from your applications.
Test the SageMaker Inference Endpoint
Test your Sagemaker endpoints using the AWS SDK (Boto3). First, you must authenticate the request using an access key and secret credentials.
PS: You can also authenticate using shared credentials, a web identity provider, and a configuration file.
After successful authentication, pass a payload to the Sagemaker endpoint.
With more complex applications, you might need to create and manage APIs using Amazon API Gateway, create an execution role for the REST API, a mapping template for response integration, and deploy the API.
Remember to delete your endpoint when you are done with this demo to save costs. Delete the endpoint in your notebook and the configuration files:
Interested in learning how to deploy SageMaker models to Modelbit? Head over to our detailed tutorial: Deploying models built with AWS SageMaker
Deploy ML Models Using Modelbit
Modelbit gives you the option to deploy ML models as REST API endpoints directly from your notebooks using Python and Git APIs. In this section, you will deploy your model with a few lines of code from a Colab notebook to highlight the simplicity and quick time-to-market features.
Modelbit offers a free plan—sign up if you haven't already. It provides a fully custom Python environment backed by your git repo.
Install the Modelbit package via `pip` in your Google Colab (or Jupyter) notebook:
Follow the steps in this Colab notebook to load the sample dataset, train, and tune the XGBoost model.
Log into the "modelbit" service and create a development ("dev") or staging ("stage") branch for staging your deployment. Learn how to work with branches in the docs.
If you cannot create a “dev” branch, you can use the default "main" branch for your deployment:
You should see a link to authenticate your kernel to connect to Modelbit. Click on that link to authenticate the notebook kernel.
After successful authentication, you should see an onboarding screen if it’s your first time using Modelbit or your dashboard if you are an existing user.
Now, you are ready to deploy the model! First, create a deployment function. This is necessary because modelbit.deploy() takes a callable deployment function as a parameter.
In this case, define the “diabetes_likelihood_prediction()” function that takes in features that could predict the likelihood of diabetes from a patient’s data, hypothetically, of course.
You are now production-ready! 🚀 Pass the model prediction function "diabetes_likelihood_prediction" and the project dependencies to the "mb.deploy()" API.
Deploy your prediction function:
Calling “mb.deploy(diabetes_likelihood_prediction)” detects all your notebook dependencies, copies the environment configuration, and deploys the model and metadata files to Modelbit.
Modelbit runs a container build and creates a REST endpoint to access your model.
If everything works correctly, you should see the following output:
Test the Model Endpoint
Test the endpoint by sending a request to the endpoint:
The output of this produces a result. The result displays a value of “1”, which means that there is a possibility that the user has diabetes.
Check the “📚Logs” panel in the Modelbit UI to see real-time logs of every request made to your endpoint.
All you need to do to secure this endpoint is create an API key or authorize teammates; no sophisticated IAM configurations are required.
With two steps, “modelbit.login()” and “modelbit.deploy()”, you have a live production endpoint that:
- auto-scales
- responds in real-time
- consumes batch traffic
The best part? You can achieve all of this within your notebook environment without changing your current tech stack!
Need to ship a new model to the endpoint? Simply switch to a new git branch, and all deployments from your notebook will go through that branch. That's it! 😎.
Key Takeaways
From the analysis and code samples in this article, it's clear that deploying machine learning models with Modelbit is simpler compared to AWS SageMaker—although you can be the judge of that.
Here’s a recap of some notable advantages of using Modelbit:
1. Lightweight and intuitive deployment: The intricacies of SageMaker often pose challenges for its users. Once it's integrated into a workflow, making any modifications can feel like navigating a maze. Modelbit simplifies the deployment process. Instead of getting bogged down with endless configurations, you're just a few clicks away from deployment, thanks to its lightweight design that seamlessly integrates with your existing workflow.
2. Large model support: Don’t think that Modelbit’s lightweight nature limits it from deploying large models. It provides robust support for large models, especially for projects involving resource-intensive models. In fact, the one-click deployment could be advantageous when dealing with large models—it alleviates some of the complexities associated with their deployment. However, it is important to compare this with SageMaker Inference’s options to make an informed decision.
3. Affordable deployment: Users complain about paying a premium for inference with SageMaker—especially for large models. With Modelbit, the pricing structure accommodates various project sizes and budgets. This means more flexibility and cost savings for your production workloads.
4. On-demand Compute: One reason users stick with SageMaker is the availability of many instance types. Modelbit provides CPU and GPU compute resources on demand that autoscale to your training and production workloads. Compute is optimized to support large deployments.
5. Platform agnosticism: Another concern users have with SageMaker services is vendor lock-in within the AWS ecosystem. Modelbit allows you to deploy your models from anywhere your notebooks run, or perform inference anywhere your models live.
Final Thoughts
Modelbit’s affordability, simplicity, and support for small and large models make it an ideal alternative to Amazon SageMaker for model deployment. Whether you’re part of a small team seeking a cost-effective solution or a large team dealing with resource-intensive models, Modelbit’s products should cater to your production requirements.
Interested in exploring Modelbit further? The Getting Started guide is a good starting point. You can get started for free without the need to set up an entire SageMaker account and enable billing to deploy models.