A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction.

Journal: Journal of bioinformatics and computational biology
Published Date:

Abstract

Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.

Authors

  • Santhosh Amilpur
    Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana 506004, India.
  • Raju Bhukya
    Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana 506004, India.