Assessing Uncertainty in Machine Learning for Polymer Property Prediction: A Benchmark Study.
Journal:
Journal of chemical information and modeling
Published Date:
Jul 14, 2025
Abstract
Machine learning (ML) has emerged as a transformative tool in material science, enabling accelerated discovery and design of novel molecules while reducing experimental costs. Uncertainty quantification (UQ) is crucial for enhancing the reliability of ML predictions, particularly in high-stakes applications, such as functional polymer discovery. In this study, we present a comprehensive evaluation of nine UQ methods in ML─ensemble, Gaussian Process Regression (GPR), Monte Carlo Dropout (MCD), mean-variance estimation (MVE), Bayesian Neural Network based on Variational Inference (BNN-VI) and Markov Chain Monte Carlo (BNN-MCMC), evidential deep learning (EDL), quantile regression (QR), natural gradient boosting (NGBoost)─for predicting key polymer properties, including glass transition temperature (), band gap (), melting temperature () and decomposition temperature (). The models are assessed using three independent metrics, including prediction accuracy (), Spearman's rank correlation coefficient and calibration area, offering a robust framework for evaluating both mean predictions and uncertainty estimates. Our analysis spans data sets of four properties, out-of-distribution (OOD) experimental and molecular dynamics (MD)-derived data, high- polymers and diverse polymer types, providing a holistic perspective on model performance. Our findings reveal that optimal UQ method selection is highly context-dependent. Ensemble method consistently excelled for general in-distribution predictions across four properties. For challenging OOD scenarios, BNN-MCMC offered a strong balance of predictive accuracy and reliable UQ. NGBoost emerged as the top-performing method for high- polymers, effectively balancing accuracy and uncertainty characterization, with Ensemble method also providing excellent accuracy in this case. Furthermore, BNN-VI demonstrated superior and consistent performance across the nine distinct polymer classes evaluated. This comprehensive benchmark underscores the critical importance of selecting tailored UQ strategies to enhance the trustworthiness of ML predictions, optimize experimental validation efforts, and ultimately accelerate the discovery of advanced functional polymers.