Developing Hybrid Machine Learning Frameworks for Polymer Property Prediction Based on Composition and Sequence Features.
Journal:
Journal of chemical information and modeling
Published Date:
Jul 6, 2025
Abstract
Artificial intelligence (AI) plays a significant role in advancing polymer science and engineering. Considering the critical role of the glass transition temperature () in determining the physical properties of polymers, this study systematically investigates the influence of their composition and sequence structure on using machine learning (ML) models. To clarify the complex relationship between polymer composition and , the k-nearest neighbor mega-trend diffusion (kNNMTD) method was employed for data augmentation, and various ML models were constructed for prediction. Among them, the Random Forest model demonstrated the best performance for the generated data, achieving an of 0.85 and an RMSE of 0.38. To explore the effect of polymer sequence structure on , we further introduced natural language processing (NLP) techniques to represent polymer sequences. The data was augmented using the Wasserstein generative adversarial network (GAN) with gradient penalty (WGAN-GP) model, and predictions were made using a convolutional neural network-long short-term memory (CNN-LSTM) model. This integrated framework achieved excellent predictive performance, with an of 0.95 and an RMSE of 0.23, and demonstrated strong generalization across different data sets. In summary, this study introduces an innovative application of kNNMTD for augmenting polymer composition data combined with NLP techniques for representing polymer sequences. The proposed ML framework offers a valuable contribution to the advancement of polymer material design and optimization.