Fast and accurate microRNA search using CNN.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: There are many different types of microRNAs (miRNAs) and elucidating their functions is still under intensive research. A fundamental step in functional annotation of a new miRNA is to classify it into characterized miRNA families, such as those in Rfam and miRBase. With the accumulation of annotated miRNAs, it becomes possible to use deep learning-based models to classify different types of miRNAs. In this work, we investigate several key issues associated with successful application of deep learning models for miRNA classification. First, as secondary structure conservation is a prominent feature for noncoding RNAs including miRNAs, we examine whether secondary structure-based encoding improves classification accuracy. Second, as there are many more non-miRNA sequences than miRNAs, instead of assigning a negative class for all non-miRNA sequences, we test whether using softmax output can distinguish in-distribution and out-of-distribution samples. Finally, we investigate whether deep learning models can correctly classify sequences from small miRNA families.

Authors

  • Xubo Tang
    Department of Electronic Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
  • Yanni Sun
    Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.