Fast protein structure comparison through effective representation learning with contrastive graph neural networks.

Journal: PLoS computational biology
Published Date:

Abstract

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.

Authors

  • Chunqiu Xia
    Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China.
  • Shi-Hao Feng
    Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
  • Ying Xia
    Australian e-Health Research Centre, CSIRO, Brisbane, QLD, 4029, Australia.
  • Xiaoyong Pan
    Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark. xypan172436@gmail.com.
  • Hong-Bin Shen
    Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China. hbshen@sjtu.edu.cn.