Machine learning algorithm for precise prediction of 2'-O-methylation (Nm) sites from experimental RiboMethSeq datasets.

Journal: Methods (San Diego, Calif.)
Published Date:

Abstract

Analysis of epitranscriptomic RNA modifications by deep sequencing-based approaches brings an essential contribution to the general knowledge on their precise locations and relative stoichiometry in cellular RNAs. To reveal RNA modifications, several analytical approaches have been proposed, including antibody-driven enrichment, analysis of RT-signatures and specific chemical treatments. However, analysis and interpretation of these massive datasets, especially for low abundant cellular RNAs (e.g. mRNA and lncRNA) is not easy nor straightforward, since the insufficient specificity and selectivity are leading to massive false-positive and false-negative identifications. The main issue in the application of these methods relies on a subjective classification of potentially modified positions, mostly based on arbitrarily defined threshold values for different scores. Such approach using pre-defined scores' values was revealed to be appropriate for limited complexity datasets (for tRNA and/or rRNA analysis), but application to longer reference sequences requires much better classification algorithms. In this work we applied a machine learning algorithm (Random Forest, RF) to create a predictive model for analysis of 2'-O-methylated sites in RNA using RiboMethSeq datasets. Model's training was performed on a large collection of human rRNA datasets with well-known modification profiles and the performance of the prediction was assessed using experimentally defined profiles for other eukaryotic rRNAs (S.cerevisiae and A.thaliana). Application of this Random Forest prediction model for detection of other RNA modifications and to more complex datasets is discussed.

Authors

  • Florian Pichot
    Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Mainz, Germany; Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France.
  • Virginie Marchand
    Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France.
  • Mark Helm
    Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Mainz, Germany.
  • Yuri Motorin
    Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France; Université de Lorraine, CNRS, UMR7365 IMoPA, Nancy F-54000, France. Electronic address: motorine5@univ-lorraine.fr.