Deep learning for RNA secondary structure determination: gauging generalizability and broadening the scope of traditional methods.

Journal: RNA (New York, N.Y.)
Published Date:

Abstract

The diverse regulatory functions, protein production capacity, and stability of natural and synthetic RNAs are closely tied to their ability to fold into intricate structures. Determining RNA structure is thus fundamental to RNA biology and bioengineering. Among existing approaches to structure determination, computational secondary structure prediction offers a rapid and low-cost strategy and is thus widely used, especially when seeking to identify functional RNA elements in large transcriptomes or screen massive libraries of novel designs. While traditional approaches rely on detailed measurements of folding energetics and/or probabilistic modeling of structural data, recent years have witnessed a surge in deep learning methods, inspired by their tremendous success in protein structure prediction. However, the limited diversity and volume of known RNA structures can impede their ability to accurately predict structures markedly different from the ones they have seen. This is known as the generalization gap and currently poses a major barrier to progress in the field. In this Perspective article, we gauge method generalizability using a new benchmark data set of structured RNAs we curated from the Protein Data Bank. We also discuss the emergence of deep learning methods for predicting structure probing data and use a new data set to underscore generalization challenges unique to this domain along with directions for future improvement. Expanding beyond improving predictive accuracy, we review how advances in deep learning have recently enabled scalable and accessible optimization of traditional structure prediction methods and their seamless integration with modern neural networks.

Authors

  • Marcell Szikszai
    Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia.
  • Ting-Yuan Wang
    Institute of Biotechnology and Department of Life Science, National Tsing Hua University, Hsinchu, 30013, Taiwan.
  • Ryan Krueger
    Harvard University.
  • David H Mathews
    Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY 14642, USA.
  • Max Ward
    Neurological Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, USA.
  • Sharon Aviran
    Department of Biomedical Engineering, University of California, Davis, Davis, CA, 95616, USA.

Keywords

No keywords available for this article.