Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models.

Journal: Journal of chemical information and modeling
PMID:

Abstract

Accurate prediction of post-translational modifications (PTMs) is of great significance in understanding cellular processes, by modulating protein structure and dynamics. Nowadays, with the rapid growth of protein data at different "omics" levels, machine learning models largely enriched the prediction of PTMs. However, most machine learning models only rely on protein sequence and little structural information. The lack of the systematic dynamics analysis underlying PTMs largely limits the PTM functional predictions. In this research, we present two dynamics-centric deep learning models, namely, cDL-PAU and cDL-FuncPhos, by incorporating sequence, structure, and dynamics-based features to elucidate the molecular basis and underlying functional landscape of PTMs. cDL-PAU achieved satisfactory area under the curve (AUC) scores of 0.804-0.888 for predicting phosphorylation, acetylation, and ubiquitination (PAU) sites, while cDL-FuncPhos achieved an AUC value of 0.771 for predicting functional phosphorylation (FuncPhos) sites, displaying reliable improvements. Through a feature selection, the dynamics-based coupling and commute ability show large contributions in discovering PAU sites and FuncPhos sites, suggesting the allosteric propensity for important PTMs. The application of cDL-FuncPhos in three oncoproteins not only corroborates its strong performance in FuncPhos prioritization but also gains insight into the physical basis for the functions. The source code and data set of cDL-PAU and cDL-FuncPhos are available at https://github.com/ComputeSuda/PTM_ML.

Authors

  • Fei Zhu
    Collaborative Innovation Center of Novel Software Technology and Industrialization, People's Republic of China. zhufei@suda.edu.cn.
  • Sijie Yang
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Fanwang Meng
    Department of Chemistry and Chemical Biology, McMaster University, Hamilton, ON, Canada.
  • Yuxiang Zheng
    Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.
  • Xin Ku
    Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Cheng Luo
    Department of Cardiology, Liuzhou Workers' Hospital, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China.
  • Guang Hu
    Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, North Carolina, United States of America.
  • Zhongjie Liang
    Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.