High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks.

Journal: Scientific reports
Published Date:

Abstract

Engineered T-cell receptor (eTCR) systems rely on accurately generated T-cell receptor (TCR) sequences to enhance immunotherapy predictability and efficacy. The most variable and crucial part of the TCR receptor is the CDR3 sequence region. Current methods for generating CDR3 sequences, including motif-based and Markov models, struggle to generate reliable, diverse, and novel TCR sequences. In this study, we present the first application of Generative Adversarial Networks (GANs) for producing biologically reliable CDR3 sequences, using Long Short-Term Memory (LSTM) networks and LeakyReLU-based GANs. Our results show that LSTM models generate more diverse sequences with higher accuracy, lower discriminator loss, and higher AUC compared to LeakyReLU. However, LeakyReLU provides greater stability with a lower generator loss, achieving a total Pearson correlation score of over 0.9. Both models demonstrate the ability to produce highly realistic TCR sequences, as validated by t-SNE clustering, frequency distribution analysis, TCRd3 BLAST analysis, and in silico docking. These findings highlight the potential of GANs as a powerful tool for generating synthetic yet biologically relevant TCR sequences, a crucial step toward improving eTCR-based therapies. Further refinement of amino acid frequency distributions and clinical validation will enhance their applicability for therapeutic purposes.

Authors

  • Piotr Religa
    Department of Medicine, Karolinska Institute, Visionsgatan 18, 171 76, Solna, Sweden. piotr.religa@ki.se.
  • Michel-Edwar Mickael
    Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postepu 36A, 05-552, Jastrzebiec, Poland. m.mickael@igbzpan.pl.
  • Norwin Kubick
    Department of Biology, Institute of Plant Science and Microbiology, University of Hamburg, Ohnhorststr. 18, 22609, Hamburg, Germany.
  • Jarosław Olav Horbańczuk
    Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland.
  • Nikko Floretes
    College of Engineering, Samar State University, University Access Rd, 6700, Catbalogan City, Philippines.
  • Mariusz Sacharczuk
    Department of Experimental Genomics, Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, ul. Postepu 36A, Jastrzebiec, 05-552 Magdalenka, Poland.
  • Atanas G Atanasov
    Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, Jastrzebiec, 05-552, Magdalenka, Poland.