Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes.

Journal: Metabolic engineering
PMID:

Abstract

Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.

Authors

  • Jaehyung Kim
    School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea.
  • Jihoon Woo
    School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea.
  • Joon Young Park
    School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
  • Kyung-Jin Kim
    School of Life Sciences, BK21 FOUR KNU Creative BioResearch Group, KNU Institute of Microbiology, Kyungpook National University, Daegu, 41566, Republic of Korea.
  • Donghyuk Kim
    School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.