Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

Journal: BMC bioinformatics

Published Date: Mar 20, 2020

Abstract

BACKGROUND: The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, we report a comprehensive analysis spanning prediction tasks from ulcerative colitis, atopic dermatitis, diabetes, to many cancer subtypes for a total of 24 binary and multiclass prediction problems and 26 survival analysis tasks. We systematically investigate the influence of gene subsets, normalization methods and prediction algorithms. Crucially, we also explore the novel use of deep representation learning methods on large transcriptomics compendia, such as GTEx and TCGA, to boost the performance of state-of-the-art methods. The resources and findings in this work should serve as both an up-to-date reference on attainable performance, and as a benchmarking resource for further research.

Authors

Aaron M Smith

Unlearn.AI, Inc., San Francisco, CA, USA. drams@unlearn.ai.
Jonathan R Walsh

Unlearn.AI, Inc., San Francisco, CA, USA.
John Long

Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
Craig B Davis

Oncology Global Product Development, Pfizer Inc., San Diego, CA, USA.
Peter Henstock

Business Technology, Pfizer Inc., Cambridge, MA, USA.
Martin R Hodge

Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
Mateusz Maciejewski

†Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.
Xinmeng Jasmine Mu

Oncology Research & Development, Worldwide Research & Development, Pfizer Inc., San Diego, CA, USA.
Stephen Ra

Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA.
Shanrong Zhao

Pfizer Worldwide Research and Development, Cambridge, MA, USA.
Daniel Ziemek

Inflammation and Immunology, Pfizer Worldwide Research & Development, Berlin, Germany.
Charles K Fisher

Unlearn.AI, Inc., San Francisco, CA, USA.

Keywords

Deep Learning Disease Gene Expression Profiling Humans Machine Learning Phenotype Supervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (32197580)

Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals