PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.

Journal: BioMed research international
Published Date:

Abstract

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.

Authors

  • Jun Wang
    Department of Speech, Language, and Hearing Sciences and the Department of Neurology, The University of Texas at Austin, Austin, TX 78712, USA.
  • Huiwen Zheng
    School of Engineering, University of Melbourne, Victoria 3010, Australia.
  • Yang Yang
    Department of Gastrointestinal Surgery, The Third Hospital of Hebei Medical University, Shijiazhuang, China.
  • Wanyue Xiao
    School of Information, Syracuse University, Syracuse, NY 13244, USA.
  • Taigang Liu
    College of Information, Shanghai Ocean University, Shanghai 201306, China.