Formulation and Delivery
Daniel Reker, MSc, DrSc (he/him/his)
Assistant Professor
Duke University
Machine learning is increasingly used to accelerate scientific research and development, particularly in pharmaceuticals where it aims to support faster and more efficient drug discovery. While machine learning holds immense promise for improving ADMET properties and optimizing formulations, progress in these areas is limited by the scarcity of large high-quality datasets. To address this challenge, active learning can be used to guide data acquisition by allowing algorithms to identify and request the most informative data. This approach reduces reliance on human bias and can substantially improve predictive performance. Pairing algorithms in yoked learning campaigns - where one model selects data and another performs predictions - further enhances outcomes, particularly when using deep neural networks.
Beyond active learning, data augmentation techniques can expand and strengthen datasets to improve model performance. For example, pairwise deep learning approaches that analyze molecular relationships can predict property differences between compounds, effectively increasing dataset size and enhancing algorithmic resolution. Such methods are particularly valuable for ADMET prediction and other drug development tasks where in vivo experimental data are limited. They also enable the inclusion of partially characterized compounds, allowing models to learn from incomplete information and make more generalizable predictions. We show that these algorithms enable us to more accurately predict interactions of drugs with metabolic enzymes.
Comparable advances are needed in enhancing drug delivery, where data scarcity and system complexity pose additional challenges. We have incorporated our paired learning approaches into prodrug design pipelines that have enabled us to create safer derivatives of antibiotics and anti-cancer agents. Beyond prodrug design, we rely on designing high-throughput amenable materials as well as large-scale AI-assisted text mining workflows to generate datasets on formulations. These approaches have been successfully applied in nanoparticle development, including predictive modeling of in vivo tumor reduction by inorganic nanoparticles and machine learning–guided design of novel drug–excipient nanoparticles for antifungal and anticancer therapies.
Such examples demonstrate how the strategic integration of machine learning into drug development pipelines can accelerate discovery, improve drug delivery, and de-risk therapeutic innovation. Together, these advances highlight the transformative potential of data-driven methods to optimize both drug development and drug delivery, ultimately supporting the development of safer and more effective therapeutics.