Predicting RNA Structure Utilizing Attention from Pretrained Language Models.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

RNA possesses functional significance that extends beyond the transport of genetic information. The functional roles of noncoding RNA can be mediated through their tertiary and secondary structure, and thus, predicting RNA structure holds great promise for unleashing their applications in diagnostics and therapeutics. However, predicting the three-dimensional (3D) structure of RNA remains challenging. Applying artificial intelligence techniques in the context of natural language processing and large language models (LLMs) could incorporate evolutionary information to RNA 3D structure predictions and address both resource and data scarcity limitations. This approach could achieve faster inference times, while keeping similar accuracy outcomes compared to employing time-consuming multiple sequence alignment schemes, akin to its successful application in protein structure prediction. Herein, we evaluate the suitability of currently available pretrained nucleic acid language models (RNABERT, ERNIE-RNA, RNA Foundational Model (RNA-FM), RiboNucleic Acid Language Model (RiNALMo), and DNABERT) to predict secondary and tertiary RNA structures. We demonstrate that current nucleic acid language models do not effectively capture structural information, mainly due to architectural constraints.

Authors

  • Ioannis Papazoglou
    Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece.
  • Alexios Chatzigoulas
    Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece.
  • George Tsekenis
    Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece.
  • Zoe Cournia
    Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece.