Identifying Protein-Nucleotide Binding Residues via Grouped Multi-task Learning and Pre-trained Protein Language Models.

Journal: Journal of chemical information and modeling
PMID:

Abstract

The accurate identification of protein-nucleotide binding residues is crucial for protein function annotation and drug discovery. Numerous computational methods have been proposed to predict these binding residues, achieving remarkable performance. However, due to the limited availability and high variability of nucleotides, predicting binding residues for diverse nucleotides remains a significant challenge. To address these, we propose NucGMTL, a new grouped deep multi-task learning approach designed for predicting binding residues of all observed nucleotides in the BioLiP database. NucGMTL leverages pre-trained protein language models to generate robust sequence embedding and incorporates multi-scale learning along with scale-based self-attention mechanisms to capture a broader range of feature dependencies. To effectively harness the shared binding patterns across various nucleotides, deep multi-task learning is utilized to distill common representations, taking advantage of auxiliary information from similar nucleotides selected based on task grouping. Performance evaluation on benchmark data sets shows that NucGMTL achieves an average area under the Precision-Recall curve (AUPRC) of 0.594, surpassing other state-of-the-art methods. Further analyses highlight that the predominant advantage of NucGMTL can be reflected by its effective integration of grouped multi-task learning and pre-trained protein language models. The data set and source code are freely accessible at: https://github.com/jerry1984Y/NucGMTL.

Authors

  • Jiashun Wu
  • Yan Liu
    Department of Clinical Microbiology, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, People's Republic of China.
  • Ying Zhang
    Department of Nephrology, Nanchong Central Hospital Affiliated to North Sichuan Medical College, Nanchong, China.
  • Xiaoyu Wang
    Department of Statistics Florida State University Tallahassee, FL, USA.
  • He Yan
  • Yiheng Zhu
    College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China.
  • Jiangning Song
    College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
  • Dong-Jun Yu