Role of High Fidelity Vs. Low Fidelity Experimental Data in Machine Learning Model Performance for Predicting Polymer Solubility.

Journal: Macromolecular rapid communications
Published Date:

Abstract

Reliable classification of polymer-solvent compatibility is essential for solution formulation and materials discovery. Applying machine learning (ML) and artificial intelligence to this task is of growing interest in polymer science, but the effectiveness of such models depends on the quality/nature of the training data. This study evaluates how experimental data fidelity, as set by the experimental method, influences ML model performance by comparing classifiers trained on two experimental datasets: one generated from turbidity-based measurements using a Crystal16 parallel crystallizer as a high-fidelity source and another derived from visual solubility inspection as a low-fidelity dataset. Both datasets were encoded using one-hot encoding for polymers and Morgan fingerprints for solvents and modeled using XGBoost classifiers to predict solubility labels as soluble, insoluble, and partially soluble. Confusion matrices showed that models trained on high-fidelity data better captured partially soluble behavior and more clearly distinguished between classes, highlighting the advantage of quantitative measurements over subjective classification. We also found that adding temperature as a feature improved prediction accuracy for the low-fidelity dataset-a key consideration for literature-derived data, which often lacks this information. These findings underscore the importance of experimental rigor and completeness when developing generalizable ML-based tools for polymer solubility prediction.

Authors

  • Mona Amrihesari
    School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.
  • Manali Banerjee
    School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.
  • Raul Olmedo
    School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.
  • Blair Brettmann
    School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

Keywords

No keywords available for this article.