CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data.

Journal: PLoS computational biology
Published Date:

Abstract

Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq's complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq.

Authors

  • Kai Kang
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • Qian Meng
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • Igor Shats
    Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • David M Umbach
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • Melissa Li
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • Yuanyuan Li
    Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, State Key Laboratory of Environmental Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Xiaoling Li
    Department of Infections,Beijing Hospital of Traditional Chinese Medicine, Affiliated to the Capital Medical University, No. 23, Back Road of the Art Gallery, Dongcheng District, Beijing 100010, China.
  • Leping Li
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.