Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning.

Journal: Genes
Published Date:

Abstract

Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.

Authors

  • Zirui Fang
    School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China.
  • Zixuan Li
    Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, 516 Jungong Road, Shanghai, 200093, China.
  • Ming Li
    Radiology Department, Huadong Hospital, Affiliated with Fudan University, Shanghai, China.
  • Zhenyu Yue
    School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China.
  • Ke Li
    School of Ideological and Political Education, Shanghai Maritime University, Shanghai, China.