Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
T cell receptor (TCR) repertoires encode critical immunological signatures
for autoimmune diseases, yet their clinical application remains limited by
sequence sparsity and low witness rates. We developed EAMil, a multi-instance
deep learning framework that leverages TCR sequencing data to diagnose systemic
lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional
accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding
and enhanced gate attention mechanisms, our model achieved state-of-the-art
performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully
identified disease-associated genes with over 90% concordance with established
differential analyses and effectively distinguished disease-specific TCR genes.
The model demonstrated robustness in classifying multiple disease categories,
utilizing the SLEDAI score to stratify SLE patients by disease severity as well
as to diagnose the site of damage in SLE patients, and effectively controlling
for confounding factors such as age and gender. This interpretable framework
for immune receptor analysis provides new insights for autoimmune disease
detection and classification with broad potential clinical applications across
immune-mediated conditions.