Enzyme function prediction using contrastive learning.

Journal: Science (New York, N.Y.)
PMID:

Abstract

Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed. However, most of these tools cannot accurately predict functional annotations, such as enzyme commission (EC) number, for less-studied proteins or those with previously uncharacterized functions or multiple activities. We present a machine learning algorithm named CLEAN (contrastive learning-enabled enzyme annotation) to assign EC numbers to enzymes with better accuracy, reliability, and sensitivity compared with the state-of-the-art tool BLASTp. The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes, (ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers-functions that we demonstrate by systematic in silico and in vitro experiments. We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes, thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.

Authors

  • Tianhao Yu
    Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA.
  • Haiyang Cui
    Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
  • Jianan Canal Li
    National Science Foundation Molecule Maker Lab Institute, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
  • Yunan Luo
    School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
  • Guangde Jiang
    Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA.
  • Huimin Zhao
    Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA. zhao5@illinois.edu.