Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets.

Journal: Cancer cell
PMID:

Abstract

Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.

Authors

  • Kyle Ellrott
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA. ellrott@ohsu.edu.
  • Christopher K Wong
    Department of Biomolecular Engineering, University of California, Santa Cruz, California, United States of America.
  • Christina Yau
    University of California, San Francisco, Department of Surgery, San Francisco, CA 94158, USA; Buck Institute for Research on Aging, Novato, CA 94945, USA.
  • Mauro A A Castro
    Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba, PR 81520-260, Brazil.
  • Jordan A Lee
    Oregon Health and Science University, Portland, OR 97239, USA.
  • Brian J Karlberg
    Oregon Health and Science University, Portland, OR 97239, USA.
  • Jasleen K Grewal
    Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada.
  • Vincenzo Lagani
    3 Gnosis Data Analysis PC, Heraklion, Greece.
  • Bahar Tercan
    Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA.
  • Verena Friedl
  • Toshinori Hinoue
    Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, USA.
  • Vladislav Uzunangelov
    Department of Biomolecular Engineering, University of California, Santa Cruz, California, United States of America.
  • Lindsay Westlake
    The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
  • Xavier Loinaz
    Department of Computer Science, Brown University, Providence, Rhode Island, USA.
  • Ina Felau
    Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA.
  • Peggy I Wang
    Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA.
  • Anab Kemal
    Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA.
  • Samantha J Caesar-Johnson
    Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA.
  • Ilya Shmulevich
    Institute for Systems Biology, Seattle, WA 98109, USA.
  • Alexander J Lazar
    The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
  • Ioannis Tsamardinos
    2 Department of Computer Science, University of Huddersfield, UK.
  • Katherine A Hoadley
    University of North Carolina, Chapel Hill, NC 27599, USA.
  • A Gordon Robertson
    Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.
  • Theo A Knijnenburg
    Institute for Systems Biology, Seattle, WA 98109, USA.
  • Christopher C Benz
    Buck Institute for Research on Aging, Novato, CA 94945, USA.
  • Joshua M Stuart
    University of California, Santa Cruz, Santa Cruz, CA 95064, USA.
  • Jean C Zenklusen
    Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA.
  • Andrew D Cherniack
    The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
  • Peter W Laird
    Van Andel Research Institute, Grand Rapids, MI 49503, USA.