Ense-i6mA: Identification of DNA N-Methyladenine Sites Using XGB-RFE Feature Selection and Ensemble Machine Learning.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
PMID:

Abstract

DNA N-methyladenine (6mA) is an important epigenetic modification that plays a vital role in various cellular processes. Accurate identification of the 6mA sites is fundamental to elucidate the biological functions and mechanisms of modification. However, experimental methods for detecting 6mA sites are high-priced and time-consuming. In this study, we propose a novel computational method, called Ense-i6mA, to predict 6mA sites. Firstly, five encoding schemes, i.e., one-hot encoding, gcContent, Z-Curve, K-mer nucleotide frequency, and K-mer nucleotide frequency with gap, are employed to extract DNA sequence features. Secondly, eXtreme gradient boosting coupled with recursive feature elimination is applied to remove noisy features for avoiding over-fitting, reducing computing time and complexity. Then, the best subset of features is fed into base-classifiers composed of Extra Trees, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and Support Vector Machine. Finally, to minimize generalization errors, the prediction probabilities of the base-classifiers are aggregated by averaging for inferring the final 6mA sites results. We conduct experiments on two species, i.e., Arabidopsis thaliana and Drosophila melanogaster, to compare the performance of Ense-i6mA against the recent 6mA sites prediction methods. The experimental results demonstrate that the proposed Ense-i6mA achieves area under the receiver operating characteristic curve values of 0.967 and 0.968, accuracies of 91.4% and 92.0%, and Mathew's correlation coefficient values of 0.829 and 0.842 on two benchmark datasets, respectively, and outperforms several existing state-of-the-art methods.

Authors

  • Xueqiang Fan
    College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
  • Bing Lin
    Affiliated Hospital of Chengdu University of Traditional Chinese Medicine, No. 37 Shierqiao Avenue, Jinniu District, Chengdu 610075, China.
  • Jun Hu
    Jinling Clinical Medical College, Nanjing Medical University,Nanjing,Jiangsu 210002,China.
  • Zhongyi Guo