BERMP: a cross-species classifier for predicting mA sites by integrating a deep learning algorithm and a random forest approach.
Journal:
International journal of biological sciences
Published Date:
Jan 1, 2018
Abstract
N-methyladenosine (mA) is a prevalent RNA methylation modification involved in several biological processes. Hundreds or thousands of mA sites identified from different species using high-throughput experiments provides a rich resource to construct approaches for identifying mA sites. The existing mA predictors are developed using conventional machine-learning (ML) algorithms and most are species-centric. In this paper, we develop a novel cross-species deep-learning classifier based on bidirectional Gated Recurrent Unit (BGRU) for the prediction of mA sites. In comparison with conventional ML approaches, BGRU achieves outstanding performance for the dataset that contains over fifty thousand mA sites but inferior for the dataset that covers around a thousand positives. The accuracy of BGRU is sensitive to the data size and the sensitivity is compensated by the integration of a random forest classifier with a novel encoding of enhanced nucleic acid content. The integrated approach dubbed as BGRU-based Ensemble RNA Methylation site Predictor (BERMP) has competitive performance in both cross-validation test and independent test. BERMP also outperforms existing mA predictors for different species. Therefore, BERMP is a novel multi-species tool for identifying mA sites with high confidence. This classifier is freely available at http://www.bioinfogo.org/bermp.