Development of DeepPQK and DeepQK sequence-based deep learning models to predict protein-ligand affinity and application in the directed evolution of ferulic esterase DLfae4.
Journal:
International journal of biological macromolecules
PMID:
40054795
Abstract
Affinity plays an essential role in the rate and stability of enzyme-catalyzed reactions, thus directly impacting the catalytic activity. In general, the predictive method for protein-ligand binding affinity mainly relies on high-resolution protein crystal structure data; however, some protein crystals are difficult to culture, time-consuming, and expensive to obtain. In this study, two sequence-based neural network deep learning models - DeepPQK and DeepQK, were constructed to predict the protein-ligand binding affinity. DeepPQK was developed by integrating local and global contextual features using convolutional neural networks(CNN) with protein sequences, pocket amino acids, and ligands as input. In particular, the protein-binding pocket, which possesses special properties for directly binding the ligand, was used as the local input feature for predicting protein-ligand binding affinity. DeepQK, consisting of a protein sequence module and a ligand module, utilizes these features for its predictions, enabling the identification of the intrinsic relationship between protein sequence and affinity. Specifically, dilated convolution was used to capture multiscale long-range interactions and the special sequence-level features of a protein and ligand. When tested on the 2016 core dataset, the Pearson correlation coefficient of DeepPQK and DeepQK reached 0.805 and 0.804 respectively, which is a significant accuracy improvement compared with the recent state-of-art methods. Both models, once trained, can learn the two- and three-dimensional structural properties of proteins, and the relative position relationship between proteins and ligands. Based on the results, a series of variants of feruloyl esterase DLFae4 were designed using DeepPQK and DeepQK, and the enzyme activity of these mutations was verified by experiments, among which the optimal mutant I149G/W237H/M297C improved 5.6-fold enzyme activity and 10.1-fold catalytic efficiency than the wild-type enzyme. In conclusion, DeepPQK and DeepQK deep learning models overcome the limitations of traditional methods that depend on protein crystal structures and have been successfully applied to guide the directed evolution of enzymes, providing a new approach to studying enzyme-directed evolution. The resource codes are available at https://github.com/KK-SW1207/DeepPQK_QK.