PhosBERT: A self-supervised learning model for identifying phosphorylation sites in SARS-CoV-2-infected human cells.

Journal: Methods (San Diego, Calif.)
PMID:

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.

Authors

  • Yong Li
    Department of Surgical Sciences, Western Michigan University Homer Stryker M.D. School of Medicine, Kalamazoo, MI, United States.
  • Ru Gao
    The People's Hospital of Ya 'an, Ya'an 625000, Sichuan, China; The People's Hospital of Wenjiang Chengdu, Chengdu 611130, Sichuan, China.
  • Shan Liu
    Department of Radiology, General Hospital of Ningxia Medical University, Yinchuan, China.
  • Hongqi Zhang
    China International Neuroscience Institute (China-INI), Beijing, China xwzhanghq@163.com qinlan@unionstrongtech.com.
  • Hao Lv
    1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China .
  • Hongyan Lai
    Department of Medical Oncology, Shanghai Key Laboratory of Medical Epigenetics, Fudan University Shanghai Cancer Center, Institutes of Biomedical Sciences, Fudan University, 270 Dong An Rd, Shanghai, 200032, China.