AliNA - a deep learning program for RNA secondary structure prediction.

Journal: Molecular informatics
Published Date:

Abstract

Nowadays there are numerous discovered natural RNA variations participating in different cellular processes and artificial RNA, e. g., aptamers, riboswitches. One of the required tasks in the investigation of their functions and mechanism of influence on cells and interaction with targets is the prediction of RNA secondary structures. The classic thermodynamic-based prediction algorithms do not consider the specificity of biological folding and deep learning methods that were designed to resolve this issue suffer from homology-based methods problems. Herein, we present a method for RNA secondary structure prediction based on deep learning - AliNA (ALIgned Nucleic Acids). Our method successfully predicts secondary structures for non-homologous to train-data RNA families thanks to usage of the data augmentation techniques. Augmentation extends existing datasets with easily-accessible simulated data. The proposed method shows a high quality of prediction across different benchmarks including pseudoknots. The method is available on GitHub for free (https://github.com/Arty40m/AliNA).

Authors

  • Shamsudin S Nasaev
    Institute of Biomedical Chemistry, 10, Pogodinskaya str., 119121, Moscow, Russia.
  • Artem R Mukanov
    A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia.
  • Ivan I Kuznetsov
    Moscow University of Finance and Law, 10 block 1, Serpuhovsky val str., 115191, Moscow, Russia.
  • Alexander V Veselovsky
    Institute of Biomedical Chemistry, 10, Pogodinskaya str., 119121, Moscow, Russia.