Evolutionary-scale prediction of atomic-level protein structure with a language model.

Journal: Science (New York, N.Y.)
Published Date:

Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

Authors

  • Zeming Lin
    Department of Computer Science, New York University, New York, NY 10012.
  • Halil Akin
    FAIR, Meta AI, New York, NY, USA.
  • Roshan Rao
    FAIR, Meta AI, New York, NY, USA.
  • Brian Hie
    FAIR, Meta AI, New York, NY, USA.
  • Zhongkai Zhu
    FAIR, Meta AI, New York, NY, USA.
  • Wenting Lu
    Zhujiang Hospital, Southern Medical University, 253 Gongye Road, Guangzhou, Guangdong 510280, China. Electronic address: luwenting23@163.com.
  • Nikita Smetanin
    FAIR, Meta AI, New York, NY, USA.
  • Robert Verkuil
    FAIR, Meta AI, New York, NY, USA.
  • Ori Kabeli
    FAIR, Meta AI, New York, NY, USA.
  • Yaniv Shmueli
    FAIR, Meta AI, New York, NY, USA.
  • Allan Dos Santos Costa
    Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Maryam Fazel-Zarandi
    FAIR, Meta AI, New York, NY, USA.
  • Tom Sercu
    IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.
  • Salvatore Candido
    FAIR, Meta AI, New York, NY, USA.
  • Alexander Rives
    Facebook AI Research, New York, NY 10003; arives@cs.nyu.edu.