Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites.

Journal: Nature communications
PMID:

Abstract

Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.

Authors

  • Xiaorui Wang
    Structural Biophysics Group, School of Optometry and Vision Sciences, Cardiff University, Cardiff, Wales, UK.
  • Xiaodan Yin
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
  • Dejun Jiang
    Innovation Institute for Artificial Intelligence in Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China.
  • Huifeng Zhao
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
  • Zhenxing Wu
  • Odin Zhang
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
  • Jike Wang
    School of Computer Science, Wuhan University, Wuhan, Hubei 430072, China.
  • Yuquan Li
    College of Chemistry and Chemical Engineering at Lanzhou University.
  • Yafeng Deng
    Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China.
  • Huanxiang Liu
    Lanzhou University.
  • Pei Luo
    Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China.
  • Yuqiang Han
    Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, 999077, China.
  • Tingjun Hou
    College of Pharmaceutical Sciences, Zhejiang University , Hangzhou, Zhejiang 310058, China.
  • Xiaojun Yao
    Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, PR China.
  • Chang-Yu Hsieh
    Tencent Quantum Laboratory, Tencent, Shenzhen 518057 Guangdong, P. R. China.