Teaching AI to speak protein.

Journal: Current opinion in structural biology
PMID:

Abstract

Large Language Models for proteins, namely protein Language Models (pLMs), have begun to provide an important alternative to capturing the information encoded in a protein sequence in computers. Arguably, pLMs have advanced importantly to understanding aspects of the language of life as written in proteins, and through this understanding, they are becoming an increasingly powerful means of advancing protein prediction, e.g., in the prediction of molecular function as expressed by identifying binding residues or variant effects. While benefitting from the same technology, protein structure prediction remains one of the few applications for which only using pLM embeddings from single sequences appears not to improve over or match the state-of-the-art. Fine-tuning foundation pLMs enhances efficiency and accuracy of solutions, in particular in cases with few experimental annotations. pLMs facilitate the integration of computational and experimental biology, of AI and wet-lab, in particular toward a new era of protein design.

Authors

  • Michael Heinzinger
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. mheinzinger@rostlab.org.
  • Burkhard Rost