GINClus: RNA structural motif clustering using graph isomorphism network.

Journal: NAR genomics and bioinformatics
Published Date:

Abstract

Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with -means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.

Authors

  • Nabila Shahnaz Khan
    Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States.
  • Md Mahfuzur Rahaman
    Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States.
  • Shaojie Zhang
    Department of Computer Science, University of Central Florida, Orlando, 32816-2362 Florida USA.