A generative reference grammar of healthy TCR repertoires reveals cancer-associated immune remodeling
Journal:
bioRxiv
Published Date:
May 4, 2026
Abstract
T-cell receptor (TCR) repertoires encode the organization of adaptive immunity and its reshaping by cancer and therapy, but disentangling treatment-associated structure from V(D)J recombination constraints remains challenging. We present CRAFT (Cancer Repertoire Anomaly Finding Transformer), a conditional sequence-to-sequence transformer that learns a nucleotide-level generative grammar of productive TCR-beta CDR3 sequences from healthy-donor repertoires, conditioned on germline V(D)J assignments. A dual-head decoder mirrors the independence of V-D and D-J recombination, and curriculum training yields embeddings that serve as a reference coordinate system for quantifying structured deviations in cancer-associated repertoires. In proof-of-concept analyses of a checkpoint blockade cohort (n=18) and a two-patient single-cell study of oncolytic immunotherapy, CRAFT-derived geometric metrics capture response-associated immune remodeling, including longitudinal shifts in repertoire organization. In antigen-labeled benchmarks, CRAFT yields coherent organization across specificity classes while highlighting settings where CDR3-beta alone provides partial signal.