Developing Hybrid Machine Learning Frameworks for Polymer Property Prediction Based on Composition and Sequence Features.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Artificial intelligence (AI) plays a significant role in advancing polymer science and engineering. Considering the critical role of the glass transition temperature () in determining the physical properties of polymers, this study systematically investigates the influence of their composition and sequence structure on using machine learning (ML) models. To clarify the complex relationship between polymer composition and , the k-nearest neighbor mega-trend diffusion (kNNMTD) method was employed for data augmentation, and various ML models were constructed for prediction. Among them, the Random Forest model demonstrated the best performance for the generated data, achieving an of 0.85 and an RMSE of 0.38. To explore the effect of polymer sequence structure on , we further introduced natural language processing (NLP) techniques to represent polymer sequences. The data was augmented using the Wasserstein generative adversarial network (GAN) with gradient penalty (WGAN-GP) model, and predictions were made using a convolutional neural network-long short-term memory (CNN-LSTM) model. This integrated framework achieved excellent predictive performance, with an of 0.95 and an RMSE of 0.23, and demonstrated strong generalization across different data sets. In summary, this study introduces an innovative application of kNNMTD for augmenting polymer composition data combined with NLP techniques for representing polymer sequences. The proposed ML framework offers a valuable contribution to the advancement of polymer material design and optimization.

Authors

  • Qian Li
    Emergency and Critical Care Center, Department of Emergency Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China.
  • Siqi Zhan
    State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, PR China.
  • Zhanjie Liu
    College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing 100029, PR China.
  • Caibo Dong
    College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, PR China.
  • Hengheng Zhao
    State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, PR China.
  • Tongkui Yue
    State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, PR China.
  • Qingsong Zhao
    National Engineering Research Center for Synthesis of Novel Rubber and Plastic Materials, Yanshan Branch of Beijing Research Institute of Chemical Industry, China Petroleum & Chemical Company (Sinopec Corp.), Beijing 102500, PR China.
  • Liqun Zhang
    Department of Biomedical Engineering, Sichuan University, Chengdu, China.
  • Ying Li
    School of Information Engineering, Chang'an University, Xi'an 710010, China.
  • Jun Liu
    Department of Radiology, Second Xiangya Hospital, Changsha, Hunan, China.