Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis.

Journal: Nature communications
PMID:

Abstract

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. We present DESC, an unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function. Through iterative self-learning, DESC gradually removes batch effects, as long as technical differences across batches are smaller than true biological variations. As a soft clustering algorithm, cluster assignment probabilities from DESC are biologically interpretable and can reveal both discrete and pseudotemporal structure of cells. Comprehensive evaluations show that DESC offers a proper balance of clustering accuracy and stability, has a small footprint on memory, does not explicitly require batch information for batch effect removal, and can utilize GPU when available. As the scale of single-cell studies continues to grow, we believe DESC will offer a valuable tool for biomedical researchers to disentangle complex cellular heterogeneity.

Authors

  • Xiangjie Li
    Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Kui Wang
    The Department of Mechanical Engineering, The University of Hong Kong, Pokfulam, Hong Kong.
  • Yafei Lyu
    Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Huize Pan
    Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York, NY, 10032, USA.
  • Jingxiao Zhang
    Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872, China.
  • Dwight Stambolian
    Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Katalin Susztak
    Departments of Medicine and Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Muredach P Reilly
    Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York, NY, 10032, USA.
  • Gang Hu
    Ping An Health Technology, Beijing, China.
  • Mingyao Li
    Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA. mingyao@pennmedicine.upenn.edu.