Structure-based protein function prediction using graph convolutional networks.

Journal: Nature communications
Published Date:

Abstract

The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .

Authors

  • Vladimir Gligorijević
    Center for Computational Biology, Flatiron Institute, New York, NY, USA. vgligorijevic@flatironinstitute.org.
  • P Douglas Renfrew
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Tomasz Kosciolek
    Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
  • Julia Koehler Leman
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Daniel Berenberg
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Tommi Vatanen
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Chris Chandler
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Bryn C Taylor
    Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA.
  • Ian M Fisk
    Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA.
  • Hera Vlamakis
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Ramnik J Xavier
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Rob Knight
    Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA 92093, USA.
  • Kyunghyun Cho
    Department of Information and Computer Science, Aalto University School of Science, Finland.
  • Richard Bonneau
    Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA.