RNA contact prediction by data efficient deep learning.

Journal: Communications biology
PMID:

Abstract

On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps") as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction.

Authors

  • Oskar Taubert
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
  • Fabrice von der Lehr
    Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany.
  • Alina Bazarova
    Institute of Translational Medicine, Birmingham, UK.
  • Christian Faber
    Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany.
  • Philipp Knechtges
    Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany.
  • Marie Weiel
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
  • Charlotte Debus
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
  • Daniel Coquelin
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
  • Achim Basermann
    Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany.
  • Achim Streit
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
  • Stefan Kesselheim
    Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany.
  • Markus Götz
    Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany. markus.goetz@kit.edu.
  • Alexander Schug
    John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.