DCBLSTM-Deep Convolutional Bidirectional Long Short-Term Memory neural network for Q8 secondary protein structure prediction.

Journal: Computers in biology and medicine
Published Date:

Abstract

Protein secondary structure prediction involves determining a protein's secondary structure from its primary amino acid sequence, serving as a critical step toward tertiary structure prediction. This, in turn, is essential for applications in drug design, protein engineering, and genetic research. Given the complexity of this task, advanced methods such as Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks are often employed, as they effectively capture long-range dependencies between amino acids, thereby improving prediction accuracy. In this study, we utilized the latter, specifically Bidirectional Long Short-Term Memory (BLSTM) networks, which process protein sequences in both forward and backward directions. This bidirectional processing has shown considerable promise in this domain. To further enhance local feature extraction, the network architecture incorporates a local feature encoding and extraction module consisting of three 1-dimensional convolutional layers, designed to capture dependencies between adjacent amino acids. Several optimization and regularization techniques were applied to refine the model, including batch normalization, kernel initialization, kernel regularization, dropout, and pooling layers. Optimal values for each parameter were identified through meticulous hyperparameter tuning. The final proposed model, termed Deep Convolutional BLSTM (DCBLSTM), was evaluated on three publicly available and widely recognized datasets: CB513, CASP10, and CASP11. For Q8-state classification, the model achieved accuracies of 88.9%, 83.9%, and 84.3%, respectively, on these datasets. These results demonstrate that the proposed model delivers state-of-the-art accuracy, outperforming several existing benchmark models. The consistently high accuracy highlights the effectiveness and robustness of the DCBLSTM model for protein secondary structure prediction.

Authors

  • Suvidhi Banthia
    Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, 576104, Karnataka, India. Electronic address: suvidhi.banthia@learner.manipal.edu.
  • Adam Mckenna
    School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, University Road, BT7 1NN Belfast, United Kingdom. Electronic address: amckenna41@qub.ac.uk.
  • Shailendra Kumar Tiwari
    Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, 576104, Karnataka, India. Electronic address: sk.tiwari@manipal.edu.
  • Sandhya P N Dubey
    Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, 576104, Karnataka, India. Electronic address: sandhya.dubey@manipal.edu.

Keywords

No keywords available for this article.