Deep generative modeling reveals maturation-linked pairing signatures in human antibodies
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Understanding how antibody heavy and light chains pair is critical for decoding immune repertoire architecture and designing therapeutic antibodies. However, most antibody sequence databases lack paired chain information. To address this gap, we developed a two-stage deep learning framework. First, we pre-trained separate transformer-based language models on large corpora of unpaired heavy and light chain sequences to capture patterns of gene usage and somatic hypermutation. These models were then integrated via lightweight adapters into a sequence-to-sequence model trained in a machine translation setting, enabling light chain generation conditioned on heavy chain input. Although native light chain recovery was moderate, the model consistently captured functionally meaningful constraints: generated sequences exhibited high germline identity, improved structural quality of predicted folds, and broader coverage of framework and CDR regions. Immunologically, heavy chains from memory B cells preferentially generated light chains with more restricted V gene usage, reflecting maturation-dependent selection. Additionally, generated κ light chains displayed a trimodal similarity distribution, suggesting distinct functional pairing modes ranging from promiscuous to highly specific. This work shows that sequence-to-sequence modeling can uncover inter-chain dependencies and generate structurally and immunologically plausible antibody pairs, providing a foundation for computational repertoire analysis and therapeutic design. A deep generative modeling framework enables conditional generation of light chains from heavy chains, leveraging unpaired data. Conditioning enhances the structural quality and germline coherence of predicted antibodies. Memory B cell-derived heavy chains preferentially generate light chains with restricted V gene usage, consistent with maturation-dependent selection. Generated κ light chains show a trimodal similarity distribution, suggesting discrete pairing modes ranging from promiscuous to highly specific.