Weakly supervised learning uncovers phenotypic signatures in single-cell data

Journal: bioRxiv
Published Date:

Abstract

To deliver clinically relevant insights from large patient cohorts profiled with single-cell technologies, a key challenge is to relate sample-level and single-cell measurements. We present MultiMIL, a deep learning framework that applies attention-based multiple-instance learning for phenotype prediction and cell state identification. We applied MultiMIL to peripheral blood mononuclear cells from COVID-19 patients, the Human Lung Cell Atlas, and a spatial proteomics breast cancer dataset, demonstrating how our model can be utilized to find phenotype-associated cell states, learn phenotype-informed sample representations, and expand disease signatures.

Authors

  • Anastasia Litinetskaya; Soroor Hediyeh-zadeh; Amir Ali Moinfar; Mohammad Lotfollahi; Fabian J. Theis