Solo: Doublet Identification in Single-Cell RNA-Seq via Semi-Supervised Deep Learning.

Journal: Cell systems
Published Date:

Abstract

Single-cell RNA sequencing (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these "doublets" violate the fundamental premise of single-cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo embeds cells unsupervised using a variational autoencoder and then appends a feed-forward neural network layer to the encoder to form a supervised classifier. We train this classifier to distinguish simulated doublets from the observed data. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells. It is freely available from https://github.com/calico/solo. A record of this paper's transparent peer review process is included in the Supplemental Information.

Authors

  • Nicholas J Bernstein
    Calico Life Sciences LLC, South San Francisco, CA, USA.
  • Nicole L Fong
    Calico Life Sciences LLC, South San Francisco, CA, USA.
  • Irene Lam
    Calico Life Sciences LLC, South San Francisco, CA, USA.
  • Margaret A Roy
    Calico Life Sciences LLC, South San Francisco, CA, USA.
  • David G Hendrickson
    Calico Life Sciences LLC, South San Francisco, CA, USA. Electronic address: dgh@calicolabs.com.
  • David R Kelley
    Calico Labs, South San Francisco, California 94080, USA.