Because they learn as they go, machine learning models for drug discovery have to be continuously re-trained for changing conditions in drug production processes.
Enter cloud computing, which “offers a compelling suite of building blocks to sustain the machine learning (ML) life cycle integrated in iterative drug discovery,” according to the authors of a paper published online May 31 in Expert Opinion on Drug Discovery.
With that as a given, Ola Spjuth, PhD, of Uppsala University in Sweden and colleagues in academia and industry highlight the cloud’s applicable attributes.
Ranking high among these is easily scalable access to compute power, along with an “ecosystem of tools” specifically set up for data scientists working with ML modeling, the authors point out.
Many such tools are now tapping an emerging data technology called containerization, which “provides a lightweight alternative that packages everything needed to run an application—the code, its runtime dependencies and settings—as a standard, portable unit of software.”
Containerization can standardize applications’ run processes across all cloud configurations, including public, private and hybrid, the authors explain.
The currently most widely used container orchestration platform is Kubernetes (K8s), originally developed and open-sourced by Google.
“A Kubernetes cluster on top of local hardware is an increasingly common alternative for a private cloud platform, providing scientists easy-to-use and scalable access to CPUs (central processing units) and GPUs (graphics processing units),” Spjuth et al. continue.
The authors also introduce a new concept, MLOps, for machine learning and operations.
MLOps “describes the joint undertaking by data engineers, data scientists and operations professionals with the intention to cover the entire life cycle of ML modeling in production environments,” they explain. “In drug discovery, MLOps tries to bridge the disconnect in many organizations between pharmaceutical professionals, AI modelers and service providers for hosting production-grade ML models and to enable collaborations.”
Spjuth and colleagues note that drug discovery often requires working with data that’s either private or otherwise sensitive.
In these cases, cloud computing can help safeguard data while enabling collaborative drug discovery within and between organizations, they assert.
The paper is available in full for free.