Deep profiling of gene expression across 18 human cancers.

Journal: Nature biomedical engineering
PMID:

Abstract

Clinical and biological information in large datasets of gene expression across cancers could be tapped with unsupervised deep learning. However, difficulties associated with biological interpretability and methodological robustness have made this impractical. Here we describe an unsupervised deep-learning framework for the generation of low-dimensional latent spaces for gene-expression data from 50,211 transcriptomes across 18 human cancers. The framework, which we named DeepProfile, outperformed dimensionality-reduction methods with respect to biological interpretability and allowed us to unveil that genes that are universally important in defining latent spaces across cancer types control immune cell activation, whereas cancer-type-specific genes and pathways define molecular disease subtypes. By linking latent variables in DeepProfile to secondary characteristics of tumours, we discovered that mutation burden is closely associated with the expression of cell-cycle-related genes, and that the activity of biological pathways for DNA-mismatch repair and MHC class II antigen presentation are consistently associated with patient survival. We also found that tumour-associated macrophages are a source of survival-correlated MHC class II transcripts. Unsupervised learning can facilitate the discovery of biological insight from gene-expression data.

Authors

  • Wei Qiu
  • Ayse B Dincer
    Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA.
  • Joseph D Janizek
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington.
  • Safiye Celik
    Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E Stevens Way NE, Seattle, WA, 98195, USA.
  • Mikael J Pittet
    Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland.
  • Kamila Naxerova
    Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. naxerova.kamila@mgh.harvard.edu.
  • Su-In Lee
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington.