Deep generalizable prediction of RNA secondary structure via base pair motif energy.

Journal: Nature communications
Published Date:

Abstract

Deep learning methods have demonstrated great performance for RNA secondary structure prediction. However, generalizability is a common unsolved issue on unseen out-of-distribution RNA families, which hinders further improvement of the accuracy and robustness of deep learning methods. Here we construct a base pair motif library that enumerates the complete space of the locally adjacent three-neighbor base pair and records the thermodynamic energy of corresponding base pair motifs through de novo modeling of tertiary structures, and we further develop a deep learning approach for RNA secondary structure prediction, named BPfold, which learns relationship between RNA sequence and the energy map of base pair motif. Experiments on sequence-wise and family-wise datasets have demonstrated the great superiority of BPfold compared to other state-of-the-art approaches in accuracy and generalizability. We hope this work contributes to integrating physical priors and deep learning methods for the further discovery of RNA structures and functionalities.

Authors

  • Heqin Zhu
    Institute of Computing Technology, Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China.
  • Fenghe Tang
    School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei, Anhui, China.
  • Quan Quan
    School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.
  • Ke Chen
    Department of Signal Processing, Tampere University of Technology, Finland.
  • Peng Xiong
    Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, Hebei 071002, P.R.China.
  • S Kevin Zhou