By Eric Schrock, CTO & Ryan Boch, Senior Software Engineer
OM1™ is a leader in building deep clinical Real World Data (RWD) datasets to better understand patient journeys and outcomes. We use a combination of data sources, including EHR data, to create billions of data points for more than 330 million patients. However, EHRs primarily support operations and billing, so most deep clinical information remains in unstructured clinical notes. For example, you can tell from standard structured medical records that a patient was diagnosed with Psoriasis. But you won’t know what part of the body is affected, what symptoms they may be experiencing, and whether their condition has improved or worsened.
At OM1™, we use machine learning to process the raw text of hundreds of millions of doctors’ notes, patient records, and medical histories into novel insights. Over the last decade, advancements in machine learning have helped usher in a new era of clinical insights. We can now analyze vast amounts of unstructured data in ways no human can, identifying correlations between symptoms and outcomes that might elude even the most seasoned clinicians. The resulting data provides unprecedented insight into real-world patient journeys, bridging evidence gaps from bench to bedside.
These insights lead to increased medical treatment effectiveness, improved access to care, and personalized medical insights that predict patient trajectories and inform clinical decision-making. Our innovations are helping improve health outcomes and save lives every day.
At OM1™, we run large-scale data pipelines that process structured and unstructured data. These pipelines first standardize and organize the raw information, placing the raw test of doctors’ notes, patient records, and other unstructured text-based data into a standard data store.
When we look at clinical notes, we have found that off-the-shelf natural language processing (NLP) models, even ones trained on clinical text, don’t fare particularly well. They struggle because clinicians often don’t write in “natural language,” using abbreviations, esoteric shorthand, and semi-structured patterns that vary across institutions and providers.
We use a collection of proprietary models and approaches to understand the semantic context of clinical text so we can extract and estimate structured clinical concepts. Such elements include patient symptoms, disease severity, and patient outcomes.
This approach creates a powerful engine for understanding deep clinical narratives:
Our data pipelines are built primarily with dbt on top of Snowflake. Historically, when we needed to run non-SQL code, we would have to build, deploy, and operate multiple compute environments that could run Python or other code. We made this work, but it came with additional cost:
These costs were consuming precious engineering bandwidth and dragging down productivity. We knew we wanted a more modern approach and were delighted when we learned how seamlessly Modelbit could operate within our existing Snowflake infrastructure and processes.
As an early adopter of Snowflake and dbt, we wanted to keep our data and compute environments as simple as possible. Writing and deploying non-SQL code should be an enjoyable experience and work seamlessly with our infrastructure-as-code and continuous integration foundation.
After learning about Modelbit through the Snowflake ecosystem, we were able to rapidly prototype, evaluate, and deploy new models without any of the legacy overhead of managing third-party compute environments. Within a few months, we developed and deployed new models into production using Modelbit. Here is how things look today:
We have multiple ML development environments, but they all leverage notebook concepts for developing and organizing our code. Modelbit’s ability to deploy from anywhere makes it easy to use in any development environment, from local Python to SaaS notebook environments. Any production code needs to be reviewed and versioned in Git, and Modelbit’s deep Git integration makes it easy to version code and artifacts. By defining all our infrastructure in code in our Git repo, we can leverage critical components of our development lifecycle, such as peer code review, continuous integration testing, and separate development/stage/production branches.
Dbt is our primary data orchestration tool at OM1. Standardizing on SQL has made it easier to develop, test, and collaborate on data pipelines. However, invoking Python-based transformations and ML models has always been a challenge. Modelbit produces a SQL function in Snowflake for each model and handles naming, versioning, and data marshaling. This process enables seamless integration of Python models into our dbt environment and provides flexibility in structuring our pipelines. When there are a series of Python steps to execute, we can place those steps together in a single model or package each separately. When packaged independently, we can use Snowflake and dbt to manage orchestration and maximize parallelism without burdening users.
Snowflake is our primary data store at OM1. As Snowflake has expanded into broader computing capabilities, we have increasingly sought to keep processing within the Snowflake environment whenever possible. Snowpark provides Snowflake native compute environments, but managing Python packages and deployments has been cumbersome. Modelbit can transparently deploy to Snowpark when models are compatible with the Snowpark runtime. This capability allows us to run Snowpark and non-Snowpark models through a common development and deployment framework. All accessible through SQL functions without consumers needing to manage the details.
Modelbit has helped mature our MLOps needs by providing a centralized registry of all our models, configurations, and versions. It provides one source of truth for what has been deployed and what is being used without having to comb through individual Git repositories or Snowflake logs. With Modelbit’s integrated observability, we can quickly see how models are executing and rapidly debug issues through centralized logs.
At OM1, we believe that personalized medicine is the way of the future, and AI is the path to get there. With access to large-scale, clinically rich Real-World Data, we are able to extract insights and predict outcomes that were previously thought impossible. These insights help the development and adoption of new therapies and inform clinical decision-making to improve patient health outcomes.
Rapidly developing and deploying new ML models with minimal overhead is critical to our success. We have achieved what we have through the hard work of our engineering team, which developed and maintained custom tooling for executing Python models at scale. Modelbit alleviates that burden, giving us more bandwidth to focus on our strategic capabilities. More importantly, Modelbit provides a richer and more streamlined set of capabilities so that we can iterate and deliver more quickly.
Tools like Modelbit are helping us experiment with new models faster and allowing our teams to spend more time doing what they do best: building transformational technologies for the healthcare industry.