Glydentify: An explainable deep learning platform for glycosyltransferase donor substrate prediction
Journal:
bioRxiv
Published Date:
Mar 17, 2026
Abstract
Glycosyltransferases (GTs) are a large family of enzymes that catalyze the formation of glycosidic linkages between chemically diverse donor and acceptor molecules to regulate diverse cellular processes across all domains of life. Despite their importance, the activated sugar donors (donor substrates) used by most GTs remain unidentified, limiting our understanding of GT functions. To address this challenge, we developed Glydentify, a deep learning framework that predicts donor usage across GT-A and GT-B fold glycosyltransferases. Trained on large-scale UniProt annotations, Glydentify integrates protein sequence embeddings learned from protein language models with chemical features derived from molecular encoders trained on extensive chemical datasets. The resulting models achieve high predictive performance, with precision-recall AUCs (PR-AUC) of 0.86 for GT-A and 0.91 for GT-B, surpassing general enzyme-substrate predictors while requiring minimal manual curation. We employed Glydentify to predict the donor specificity of uncharacterized plant GTs and experimentally tested the predictions using in vitro biochemical assays. Furthermore, we demonstrate that the model utilizes a combination of evolutionary, structural, and biochemical features to predict donor specificity through residue attention score analysis. Together, these results establish Glydentify as a robust, explainable framework for decoding donor-glycosyltransferase relationships and highlight its potential as a broadly applicable framework for modeling enzyme classes that act on chemically diverse substrates.