Combining knowledge distillation and neural networks to predict protein secondary structure.

Journal: Scientific reports
Published Date:

Abstract

The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein's 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency. This study proposes a combined model, ITBM-KD, which integrates an improved temporal convolutional network (TCN), bidirectional recurrent neural network (BiRNN), and multilayer perceptron (MLP) to increase the accuracy of protein secondary structure prediction for octapeptides and tripeptides. By combining one-hot encoding, word vector representation of physicochemical properties, and knowledge distillation with the ProtT5 model, the proposed model achieves excellent performance on multiple datasets. To evaluate its effectiveness, two classic datasets, TS115 and CB513, containing 115 and 513 protein datasets, respectively, were used. In addition, 15,078 protein data points collected from the PDB database from June 6, 2018, to June 6, 2020, were used to further verify the robustness and generalizability of the model. This study improves prediction accuracy and provides an essential model for understanding protein structure and function, especially in resource-limited settings.

Authors

  • Lufei Zhao
    Agricultural Science and Engineering School, Liaocheng University, Liaocheng, 252059, China.
  • Jingyi Li
    Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology and College of Veterinary Medicine, Huazhong Agricultural University, 430070 Wuhan, PR China. Electronic address: lijingyi@mail.hzau.edu.cn.
  • Biao Zhang
    Heilongjiang Provincial Key Laboratory of Complex Intelligent System and Integration, School of Automation, Harbin University of Science and Technology, Harbin 150080, China.
  • Xuchu Jiang
    School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China. xuchujiang@zuel.edu.cn.