Heat Capacity of Ionic Liquids: Toward Interpretable Chemical Structure-Based Machine Learning Approaches.
Journal:
Journal of chemical information and modeling
PMID:
40208008
Abstract
This study focuses on predicting the heat capacity of pure liquid-phase ionic liquids (ILs) using machine learning models from various categories, including support vector machines, instance-based learning, ensemble learning, and neural networks, with linear regression serving as a baseline. A key aim of this work is not only to achieve accurate predictions but also to ensure the interpretability of the results, addressing a gap often overlooked in predictive modeling studies. To accomplish this, we curated and cleaned a comprehensive data set of 13,893 data points covering 322 ILs, using temperature and chemical structure-based features as inputs. We evaluated model performance and conducted a thorough interpretability analysis to reveal the patterns of the top-performing model's predictions, ensuring that they are understandable. All models outperformed the baseline, with XGBoost (eXtreme Gradient Boosting) from the ensemble learning category achieving the best results, with total RMSE, , and AARD (%) values of 11.389, 0.997, and 1.212%, respectively. Shallow neural networks also performed competitively, suggesting that complex deep learning architectures may not be necessary. Both 10-fold and leave-one-IL-out (LOILO) cross-validation further validated the robustness of these results. Importantly, the interpretability analysis identified key factors influencing heat capacity predictions, such as anion size (e.g., NTf and FAP) and alkyl chain length. These factors were validated by testing the model on previously unseen IL examples. Additionally, a user-friendly web application was developed to make predictions, allowing users to input chemical groups or select compounds from a predefined list of 1633 ILs. This study underscores the importance of combining diverse modeling approaches with robust interpretability techniques to achieve reliable and explainable predictions for IL heat capacity.