Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model.

Journal: Journal of molecular biology
PMID:

Abstract

Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2-3% improvement over existing models.

Authors

  • Bilal Ahmad Mir
    Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea. Electronic address: bilalmir93@jbnu.ac.kr.
  • Mobeen Ur Rehman
  • Hilal Tayara
    Department of Electronics and Information Engineering, Chonbuk National University, Jeonju 54896, South Korea. Electronic address: hilaltayara@jbnu.ac.kr.
  • Kil To Chong
    Division of Electronic Engineering, and Advanced Research Center of Electronics and Information, Chonbuk National University, Jeonju-Si 54896, South Korea. Electronic address: kitchong@jbnu.ac.kr.