Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.

Journal: Nucleic acids research

PMID: 39873270

Abstract

Machine learning (ML) has shown great potential in the adaptive immune receptor repertoire (AIRR) field. However, there is a lack of large-scale ground-truth experimental AIRR data suitable for AIRR-ML-based disease diagnostics and therapeutics discovery. Simulated ground-truth AIRR data are required to complement the development and benchmarking of robust and interpretable AIRR-ML methods where experimental data is currently inaccessible or insufficient. The challenge for simulated data to be useful is incorporating key features observed in experimental repertoires. These features, such as antigen or disease-associated immune information, cause AIRR-ML problems to be challenging. Here, we introduce LIgO, a software suite, which simulates AIRR data for the development and benchmarking of AIRR-ML methods. LIgO incorporates different types of immune information both on the receptor and the repertoire level and preserves native-like generation probability distribution. Additionally, LIgO assists users in determining the computational feasibility of their simulations. We show two examples where LIgO supports the development and validation of AIRR-ML methods: (i) how individuals carrying out-of-distribution immune information impacts receptor-level prediction performance and (ii) how immune information co-occurring in the same AIRs impacts the performance of conventional receptor-level encoding and repertoire-level classification approaches. LIgO guides the advancement and assessment of interpretable AIRR-ML methods.

Authors

Maria Chernigovskaya

Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.
Milena Pavlovic

UiO: RealArt Convergence Environment, University of Oslo, Oslo, Norway.
Chakravarthi Kanduri

Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway.
Sofie Gielis

Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium; Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium; Biomedical Informatics Research Network Antwerp (Biomina), University of Antwerp, Antwerp, Belgium.
Philippe A Robert

Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.
Lonneke Scheffer

Department of Informatics, University of Oslo, Oslo, Norway.
Andrei Slabodkin

Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway.
Ingrid Hobæk Haff

Department of Mathematics, University of Oslo, Oslo, 0851, Norway.
Pieter Meysman

Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium; Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium; Biomedical Informatics Research Network Antwerp (Biomina), University of Antwerp, Antwerp, Belgium.
Gur Yaari

Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel.
Geir Kjetil Sandve

UiO: RealArt Convergence Environment, University of Oslo, Oslo, Norway.
Victor Greiff

Department of Immunology, Oslo University Hospital, Oslo, Norway.

Keywords

Adaptive Immunity Benchmarking Computer Simulation Humans Machine Learning Receptors, Immunologic Software

External Resources

View on PubMed Access via DOI PubMed (39873270)

Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals