ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.

Authors

  • Qingxiong Tan
    Department of Computer Science, Hong Kong Baptist University, Hong Kong, Hong Kong.
  • Jin Xiao
    Sichuan University, China.
  • Jiayang Chen
    Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
  • Yixuan Wang
    Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
  • Zeliang Zhang
    Department of Computer Science, University of Rochester, Rochester, New York State, USA.
  • Tiancheng Zhao
    School of Software, Shandong University, Jinan, China.
  • Yu Li
    Department of Public Health, Shihezi University School of Medicine, 832000, China.