SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and γ; and using updated versions of tools and databases used for the construction of the original version of snoReport. To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms.

Authors

  • João Victor de Araujo Oliveira
    Department of Computer Science, University of Brasilia, Brasília, BR-70910-900, Brazil. joaovicers@gmail.com.
  • Fabrizio Costa
    Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
  • Rolf Backofen
    Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
  • Peter Florian Stadler
    Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany.
  • Maria Emília Machado Telles Walter
    Department of Computer Science, University of Brasilia, Brasília, BR-70910-900, Brazil.
  • Jana Hertel
    Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany.