Uncovering new families and folds in the natural protein universe.

Journal: Nature
PMID:

Abstract

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

Authors

  • Janani Durairaj
    Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands.
  • Andrew M Waterhouse
    Biozentrum, University of Basel, Basel, Switzerland.
  • Toomas Mets
    Institute of Technology, University of Tartu, Tartu, Estonia.
  • Tetiana Brodiazhenko
    Institute of Technology, University of Tartu, Tartu, Estonia.
  • Minhal Abdullah
    Institute of Technology, University of Tartu, Tartu, Estonia.
  • Gabriel Studer
    Biozentrum, University of Basel, Basel, Switzerland.
  • Gerardo Tauriello
    Biozentrum, University of Basel, Basel, Switzerland.
  • Mehmet Akdel
    VantAI, New York, NY, USA.
  • Antonina Andreeva
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK.
  • Alex Bateman
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
  • Tanel Tenson
    Institute of Technology, University of Tartu, Tartu, Estonia.
  • Vasili Hauryliuk
    Institute of Technology, University of Tartu, Tartu, Estonia.
  • Torsten Schwede
    Biozentrum, University of Basel, Basel, Switzerland.
  • Joana Pereira
    European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany.