Iterative single-cell multi-omic integration using online learning.

Journal: Nature biotechnology
Published Date:

Abstract

Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex.

Authors

  • Chao Gao
    College of Marine and Environmental Sciences, Tianjin University of Science and Technology, Tianjin 300457, China.
  • Jialin Liu
    School of Pharmacy, Second Military Medical University, Shanghai, 200433, China.
  • April R Kriebel
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
  • Sebastian Preissl
    Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, School of Medicine, La Jolla, CA, USA.
  • Chongyuan Luo
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Rosa Castanon
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Justin Sandoval
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Angeline Rivkin
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Joseph R Nery
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Margarita M Behrens
    Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Joseph R Ecker
    Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
  • Bing Ren
    Ludwig Institute for Cancer Research, La Jolla, CA, 92093, USA.
  • Joshua D Welch
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. welchjd@umich.edu.