Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Journal: Molecular therapy : the journal of the American Society of Gene Therapy

Published Date: May 6, 2022

Abstract

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.

Authors

Md Mehedi Hasan

Nutrition and Clinical Services Division, International Center for Diarrheal Disease and Research, Bangladesh (icddr,b), Dhaka, Bangladesh.
Sho Tsukiyama

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Jae Youl Cho

Department of Integrative Biotechnology, And Biomedical Institute for Convergence at SKKU (BICS), Sungkyunkwan University, Suwon, Republic of Korea.
Hiroyuki Kurata
Md Ashad Alam

Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA. Electronic address: malam@tulane.edu.
Xiaowen Liu

School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.
Balachandran Manavalan

Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.
Hong-Wen Deng

Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112, USA.

Keywords

Algorithms Computational Biology Deep Learning Humans Machine Learning RNA

External Resources

View on PubMed Access via DOI PubMed (35526094)

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals