Multi-batch single-cell comparative atlas construction by deep learning disentanglement.

Journal: Nature communications
PMID:

Abstract

Cell state atlases constructed through single-cell RNA-seq and ATAC-seq analysis are powerful tools for analyzing the effects of genetic and drug treatment-induced perturbations on complex cell systems. Comparative analysis of such atlases can yield new insights into cell state and trajectory alterations. Perturbation experiments often require that single-cell assays be carried out in multiple batches, which can introduce technical distortions that confound the comparison of biological quantities between different batches. Here we propose CODAL, a variational autoencoder-based statistical model which uses a mutual information regularization technique to explicitly disentangle factors related to technical and biological effects. We demonstrate CODAL's capacity for batch-confounded cell type discovery when applied to simulated datasets and embryonic development atlases with gene knockouts. CODAL improves the representation of RNA-seq and ATAC-seq modalities, yields interpretable modules of biological variation, and enables the generalization of other count-based generative models to multi-batched data.

Authors

  • Allen W Lynch
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Myles Brown
    Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA 02215, USA. Electronic address: myles_brown@dfci.harvard.edu.
  • Clifford A Meyer
    Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. cliff_meyer@ds.dfci.harvard.edu.