Highly accurate protein structure prediction for the human proteome.

Journal: Nature
PMID:

Abstract

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Authors

  • Kathryn Tunyasuvunakool
    DeepMind, London, UK. ktkool@deepmind.com.
  • Jonas Adler
  • Zachary Wu
    Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
  • Tim Green
    DeepMind, London, UK.
  • Michal Zielinski
    DeepMind, London, UK.
  • Augustin Žídek
    DeepMind, London, UK.
  • Alex Bridgland
    DeepMind, London, UK.
  • Andrew Cowie
    DeepMind, London, UK.
  • Clemens Meyer
    DeepMind, London, UK.
  • Agata Laydon
    DeepMind, London, UK.
  • Sameer Velankar
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Gerard J Kleywegt
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Alex Bateman
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
  • Richard Evans
    DeepMind, London, UK.
  • Alexander Pritzel
    DeepMind, London, UK.
  • Michael Figurnov
    DeepMind, London, UK.
  • Olaf Ronneberger
    DeepMind, London, EC4A 3TW, UK.
  • Russ Bates
    DeepMind, London, UK.
  • Simon A A Kohl
    Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany.
  • Anna Potapenko
    DeepMind, London, UK.
  • Andrew J Ballard
    DeepMind, London, UK.
  • Bernardino Romera-Paredes
    DeepMind, London, UK.
  • Stanislav Nikolov
    DeepMind, London, UK.
  • Rishub Jain
    DeepMind, London, UK.
  • Ellen Clancy
    DeepMind, London, UK.
  • David Reiman
    DeepMind, London, UK.
  • Stig Petersen
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Andrew W Senior
    DeepMind, London, UK.
  • Koray Kavukcuoglu
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Ewan Birney
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Pushmeet Kohli
    DeepMind, London, UK.
  • John Jumper
    DeepMind, London, UK.
  • Demis Hassabis
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.