Res-GCN: Identification of protein phosphorylation sites using graph convolutional network and residual network.

Journal: Computational biology and chemistry
PMID:

Abstract

An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomena. However, the traditional method of identifying phosphorylation sites experimentally is time-consuming and laborious, which makes it difficult to meet the processing demands of today's big data. This research proposes the use of a novel model, Res-GCN, to recognize the phosphorylation sites of SARS-CoV-2. Firstly, eight feature extraction strategies are utilized to digitize the protein sequence from multiple viewpoints, including amino acid property encodings (AAindex), pseudo-amino acid composition (PseAAC), adapted normal distribution bi-profile Bayes (ANBPB), dipeptide composition (DC), binary encoding (BE), enhanced amino acid composition (EAAC), Word2Vec, and BLOSUM62 matrices. Secondly, elastic net is utilized to eliminate redundant data in the fused matrix. Finally, a combination of graph convolutional network (GCN) and residual network (ResNet) is used to classify the phosphorylated sites and output predictions using a fully connected layer (FC). The performance of Res-GCN is tested by 5-fold cross-validation and independent testing, and excellent results are obtained on S/T and Y datasets. This demonstrates that the Res-GCN model exhibits exceptional predictive performance and generalizability.

Authors

  • Minghui Wang
    College of Chemistry and Material Science, Shandong Agricultural University, Tai'an 271018, PR China.
  • Jihua Jia
    College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China.
  • Fei Xu
    GeoHealth Initiative, Faculty of Geo-information Science and Earth Observation (ITC), University of Twente, Enschede, 7500, the Netherlands; International Initiative on Spatial Lifecourse Epidemiology (ISLE), the Netherlands; Nanjing Municipal Center for Disease Control and Prevention, Nanjing, Jiangsu, 210003, China; Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, 211100, China.
  • Hongyan Zhou
    College of Chemistry and Chemical Engineering, Southwest Petroleum University, Chengdu 610500, People's Republic of China.
  • Yushuang Liu
    College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China. Electronic address: qustlys@126.com.
  • Bin Yu
    Department of Anesthesiology, Peking University First Hospital, Ningxia Women's and Children's Hospital, Yinchuan, China.