N-of-one differential gene expression without control samples using a deep generative model.

Journal: Genome biology
Published Date:

Abstract

Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.

Authors

  • IƱigo Prada-Luengo
    Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
  • Viktoria Schuster
    Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark.
  • Yuhu Liang
    Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
  • Thilde Terkelsen
    Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark.
  • Valentina Sora
    Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
  • Anders Krogh
    Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.