Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.

Journal: Journal of chemical information and modeling

Published Date: May 19, 2025

Abstract

In early-stage drug design, machine learning models often rely on compressed representations of data, where raw experimental results are distilled into a single metric per molecule through curve fitting. This process discards valuable information about the quality of the curve fit. In this study, we incorporated a fit-quality metric into machine learning models to capture the reliability of metrics for individual molecules. Using 40 data sets from PubChem (public) and BASF (private), we demonstrated that including this quality metric can significantly improve predictive performance without additional experiments. Four methods were tested: random forests with parametric bootstrap, weighted random forests, variable output smearing random forests, and weighted support vector regression. When using fit-quality metrics, at least one of these methods led to a statistically significant improvement on 31 of the 40 data sets. In the best case, these methods led to a 22% reduction in the root-mean-squared error of the models. Overall, our results demonstrate that by adapting data processing to account for curve fit quality, we can improve predictive performance across a range of different data sets.

Authors

Hugo Bellamy

Department of Chemical engineering and biotechnology, University of Cambridge, Cambridge CB2 1TN, United Kingdom of Great Britain and Northern Ireland.
Joachim Dickhaut

BASF, Ludwigshafen 67056, Germany.
Ross D King

3Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden.

Keywords

Dose-Response Relationship, Drug Machine Learning Uncertainty

External Resources

View on PubMed Access via DOI PubMed (40384077)

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals