Building simplified cancer subtyping and prediction models with glycan gene signatures.
Journal:
Cell reports methods
Published Date:
Aug 11, 2025
Abstract
We identified a gene panel comprising 71 glycosyltransferases (GTs) that alter glycan patterns on cancer cells as they become more virulent. When these cancer-pattern GTs (CPGTs) were run through an algorithm trained on The Cancer Genome Atlas, they differentiated tumors from healthy tissue with 97% accuracy and clustered 27 cancers with 94% accuracy in external validation, revealing each variety's "biometric glycan ID." Using machine learning, we built four models for cancer classification, including two for detecting the molecular subtypes of breast cancer and glioma using even smaller CPGT sets. Our results reveal the power of using glyco-genes for diagnostics: Our breast cancer classifier was almost twice as effective in independent testing as the widely used prediction analysis of microarray 50 (PAM50) subtyping kit at differentiating between luminal A, luminal B, HER2-enriched, and basal-like breast cancers based on a comparable number of genes. Only four GT genes were needed to build a prognostic model for glioma survival.