Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, we report a comprehensive analysis spanning prediction tasks from ulcerative colitis, atopic dermatitis, diabetes, to many cancer subtypes for a total of 24 binary and multiclass prediction problems and 26 survival analysis tasks. We systematically investigate the influence of gene subsets, normalization methods and prediction algorithms. Crucially, we also explore the novel use of deep representation learning methods on large transcriptomics compendia, such as GTEx and TCGA, to boost the performance of state-of-the-art methods. The resources and findings in this work should serve as both an up-to-date reference on attainable performance, and as a benchmarking resource for further research.

Authors

  • Aaron M Smith
    Unlearn.AI, Inc., San Francisco, CA, USA. drams@unlearn.ai.
  • Jonathan R Walsh
    Unlearn.AI, Inc., San Francisco, CA, USA.
  • John Long
    Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Craig B Davis
    Oncology Global Product Development, Pfizer Inc., San Diego, CA, USA.
  • Peter Henstock
    Business Technology, Pfizer Inc., Cambridge, MA, USA.
  • Martin R Hodge
    Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Mateusz Maciejewski
    †Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.
  • Xinmeng Jasmine Mu
    Oncology Research & Development, Worldwide Research & Development, Pfizer Inc., San Diego, CA, USA.
  • Stephen Ra
    Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
  • Shanrong Zhao
    Pfizer Worldwide Research and Development, Cambridge, MA, USA.
  • Daniel Ziemek
    Inflammation and Immunology, Pfizer Worldwide Research & Development, Berlin, Germany.
  • Charles K Fisher
    Unlearn.AI, Inc., San Francisco, CA, USA.