LambdaPP: Fast and accessible protein-specific phenotype predictions.

Journal: Protein science : a publication of the Protein Society
Published Date:

Abstract

The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha-helical and beta-barrel transmembrane segments; signal-peptides; variant effect) in seconds. The structure prediction provided by LambdaPP-leveraging ColabFold and computed in minutes-is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5. Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org, the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2. The frontend of LambdaPP can be found on GitHub (github.com/sacdallago/embed.predictprotein.org), and can be freely used and distributed under the academic free use license (AFL-2). For high-throughput applications, all methods can be executed locally via the bio-embeddings (bioembeddings.com) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings, which also includes the backend of LambdaPP.

Authors

  • Tobias Olenyi
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.
  • Céline Marquet
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics- & Computational Biology-i12, Garching, Germany.
  • Michael Heinzinger
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. mheinzinger@rostlab.org.
  • Benjamin Kröger
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics- & Computational Biology-i12, Garching, Germany.
  • Tiha Nikolova
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics- & Computational Biology-i12, Garching, Germany.
  • Michael Bernhofer
    Department of Informatics & Center for Bioinformatics & Computational Biology - i12, Technische Universität München (TUM), Boltzmannstr. 3, Garching/Munich, 85748, Germany. Michael.Bernhofer@mytum.de.
  • Philip Sändig
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics- & Computational Biology-i12, Garching, Germany.
  • Konstantin Schütze
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.
  • Maria Littmann
    TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.
  • Milot Mirdita
    School of Biological Sciences, Seoul National University, Seoul, South Korea.
  • Martin Steinegger
    School of Biological Sciences, Seoul National University, Seoul, South Korea.
  • Christian Dallago
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
  • Burkhard Rost