A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network.

Journal: Methods (San Diego, Calif.)
PMID:

Abstract

It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.

Authors

  • Bing-Xue Du
    School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
  • Haoyang Yu
    Hebei Provincial Key Laboratory of Parallel Robot and Mechatronic System, Yanshan University, Qinhuangdao, 066004, China.
  • Bei Zhu
    School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
  • Yahui Long
    College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
  • Min Wu
    Guizhou University of Traditional Chinese Medicine, Guiyang, Guizhou Province, China.
  • Jian-Yu Shi
    School of Life Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China.