Mapping single-cell data to reference atlases by transfer learning.

Journal: Nature biotechnology
Published Date:

Abstract

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

Authors

  • Mohammad Lotfollahi
    Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Mohsen Naghipourfar
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Malte D Luecken
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Matin Khajavi
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Maren Büttner
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Marco Wagenstetter
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Žiga Avsec
    Department of Informatics, Technical University of Munich, 85748 Garching, Germany.
  • Adam Gayoso
    Center for Computational Biology, University of California, Berkeley, CA, USA.
  • Nir Yosef
    Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
  • Marta Interlandi
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Sergei Rybakov
    Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
  • Alexander V Misharin
    Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
  • Fabian J Theis
    Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany.