The machine-learning classifier ALLCatchR2 identifies 20 T-ALL subtypes across cohorts and age groups
Journal:
bioRxiv
Published Date:
Jun 3, 2026
Abstract
T-cell acute lymphoblastic leukemia (T-ALL) comprises molecularly diverse subtypes, but robust cross-cohort validations and operational gene-expression definitions are lacking. To establish a gene-expression-anchored framework for T-ALL subtyping, we aggregated 2,314 transcriptomes (15 cohorts, age: 0.8 to 90.8 years). An extended unsupervised approach defined 17 main clusters and 3 subclusters in samples with high blast fractions. Supervised analyses added an overarching immature T-ALL (ETP-like) definition and resolved the LMO2 {gamma}{delta}-like subtype. All clusters contained samples from at least two cohorts. Characteristic genomic driver enrichments were consistent across cohorts, while gene expression clusters did not correspond exclusively to single driver events but also reflected developmental origins. A machine learning classifier based on ALLCatchR, our B-ALL classifier, identified these 20 transcriptomic subtypes and the immature T-ALL (ETP-like) signature with 0.995-1.0 accuracy in a validation set (n=203). Testing the classifier on a second hold-out data set (n=265 samples) showed that 92.7% of predictions matched with corresponding driver alterations. Across all samples, 83.2% of cases received high-confidence predictions, 7.3% candidate predictions, and 9.5% remained unclassified, largely because of low blast fractions. We identified a novel gene expression cluster markedly enriched (P<0.001) for clonal hematopoiesis mutations (IDH2 R140Q, DNMT3A) and a stem-/progenitor cell-like gene expression. This novel "clonal hematopoiesis-related" T-ALL subtype was observed in six cohorts representing 8.9% of adults and 39.5% of patients aged >50 years. We advanced ALLCatchR, as a free R package that now enables B-/T-lineage separation, gene-expression subtyping, blast estimation, and developmental annotation to harmonize T-ALL classification across studies and clinical contexts.