NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.

Journal: Briefings in bioinformatics

Published Date: May 1, 2025

Abstract

Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.

Authors

Jingchen Zhai

Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
Xiguang Qi

Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA.
Lianjin Cai

Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
Yue Liu

School of Athletic Performance, Shanghai University of Sport, Shanghai, China.
Haocheng Tang

Department of Otolaryngology-Head and Neck Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, China.
Lei Xie

Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States.
Junmei Wang

Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: junmei.wang@pitt.edu.

Keywords

Algorithms Amino Acid Sequence Catalysis Computational Biology Neural Networks, Computer Proteins Substrate Specificity

External Resources

View on PubMed Access via DOI PubMed (40370097)

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals