AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.

Journal: Nucleic acids research
PMID:

Abstract

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

Authors

  • Mihaly Varadi
    Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. Electronic address: mvaradi@ebi.ac.uk.
  • Damian Bertoni
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Paulyna Magana
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Urmila Paramval
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Ivanna Pidruchna
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Malarvizhi Radhakrishnan
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Maxim Tsenkov
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Sreenath Nair
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Milot Mirdita
    School of Biological Sciences, Seoul National University, Seoul, South Korea.
  • Jingi Yeo
    School of Biological Sciences, Seoul National University, Seoul, South Korea.
  • Oleg Kovalevskiy
    Google DeepMind, London, UK.
  • Kathryn Tunyasuvunakool
    DeepMind, London, UK. ktkool@deepmind.com.
  • Agata Laydon
    DeepMind, London, UK.
  • Augustin Žídek
    DeepMind, London, UK.
  • Hamish Tomlinson
    Google DeepMind, London, UK.
  • Dhavanthi Hariharan
    Google DeepMind, London, UK.
  • Josh Abrahamson
    Google DeepMind, London, UK.
  • Tim Green
    DeepMind, London, UK.
  • John Jumper
    DeepMind, London, UK.
  • Ewan Birney
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
  • Martin Steinegger
    School of Biological Sciences, Seoul National University, Seoul, South Korea.
  • Demis Hassabis
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Sameer Velankar
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.