A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4-Methylcytosine Using Deep Learning Approach.

Journal: Genes
PMID:

Abstract

DNA (Deoxyribonucleic Acid) N4-methylcytosine (4mC), a kind of epigenetic modification of DNA, is important for modifying gene functions, such as protein interactions, conformation, and stability in DNA, as well as for the control of gene expression throughout cell development and genomic imprinting. This simply plays a crucial role in the restriction-modification system. To further understand the function and regulation mechanism of 4mC, it is essential to precisely locate the 4mC site and detect its chromosomal distribution. This research aims to design an efficient and high-throughput discriminative intelligent computational system using the natural language processing method "word2vec" and a multi-configured 1D convolution neural network (1D CNN) to predict 4mC sites. In this article, we propose a grid search-based multi-layer dynamic ensemble system (GS-MLDS) that can enhance existing knowledge of each level. Each layer uses a grid search-based weight searching approach to find the optimal accuracy while minimizing computation time and additional layers. We have used eight publicly available benchmark datasets collected from different sources to test the proposed model's efficiency. Accuracy results in test operations were obtained as follows: 0.978, 0.954, 0.944, 0.961, 0.950, 0.973, 0.948, 0.952, 0.961, and 0.980. The proposed model has also been compared to 16 distinct models, indicating that it can accurately predict 4mC.

Authors

  • Rajib Kumar Halder
    Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
  • Mohammed Nasir Uddin
    Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
  • Md Ashraf Uddin
    School of Information Technology, Deakin University, Geelong 3125, Australia.
  • Sunil Aryal
    School of Information Technology, Deakin University, Geelong 3125, Australia.
  • Md Aminul Islam
    COVID-19 Diagnostic Lab, Department of Microbiology, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh; Advanced Molecular Lab, Department of Microbiology, President Abdul Hamid Medical College, Karimganj, Kishoreganj, Bangladesh.
  • Fahima Hossain
    Department of Computer Science and Engineering, Hamdard University Bangladesh, Munshiganj 1510, Bangladesh.
  • Nusrat Jahan
    Department of Computer Science and Engineering, Eastern University, Dhaka 1345, Bangladesh.
  • Ansam Khraisat
    School of Information Technology, Deakin University, Geelong 3125, Australia.
  • Ammar Alazab
    School of IT, Melbourne Institute of Technology, Melbourne 3000, Australia.