Functional annotation of enzyme-encoding genes using deep learning with transformer layers.

Journal: Nature communications
PMID:

Abstract

Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.

Authors

  • Gi Bae Kim
    Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.
  • Ji Yeon Kim
    Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
  • Jong An Lee
    Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
  • Charles J Norsigian
    Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
  • Bernhard O Palsson
    Department of Bioengineering, University of California, San Diego, CA, USA.
  • Sang Yup Lee
    Department of Chemical and Biomolecular Engineering (BK21 Plus Program), KAIST, Daejeon 34141, Republic of Korea.