Building simplified cancer subtyping and prediction models with glycan gene signatures.

Journal: Cell reports methods
Published Date:

Abstract

We identified a gene panel comprising 71 glycosyltransferases (GTs) that alter glycan patterns on cancer cells as they become more virulent. When these cancer-pattern GTs (CPGTs) were run through an algorithm trained on The Cancer Genome Atlas, they differentiated tumors from healthy tissue with 97% accuracy and clustered 27 cancers with 94% accuracy in external validation, revealing each variety's "biometric glycan ID." Using machine learning, we built four models for cancer classification, including two for detecting the molecular subtypes of breast cancer and glioma using even smaller CPGT sets. Our results reveal the power of using glyco-genes for diagnostics: Our breast cancer classifier was almost twice as effective in independent testing as the widely used prediction analysis of microarray 50 (PAM50) subtyping kit at differentiating between luminal A, luminal B, HER2-enriched, and basal-like breast cancers based on a comparable number of genes. Only four GT genes were needed to build a prognostic model for glioma survival.

Authors

  • Jing Kai
    Bioscience Program, King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955-6900, Saudi Arabia.
  • Luyao Yang
  • Ayman F AbuElela
    Bioscience Program, King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955-6900, Saudi Arabia.
  • Alyaa M Abdel-Haleem
    Computer Science Program, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Centre (CBRC), Thuwal 23955-6900, Saudi Arabia.
  • Asma S AlAmoodi
    Bioscience Program, King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955-6900, Saudi Arabia.
  • Abdulghani A Bin Nafisah
    Department of Medicine and Department of Molecular Oncology, King Faisal Specialist Hospital & Research Centre, Riyadh 11211, Saudi Arabia.
  • Alfadel Alshaibani
    Department of Hematology, SCT and Cellular Therapy, King Faisal Specialist Hospital & Research Centre, Riyadh 11211, Saudi Arabia.
  • Ali S Alzahrani
    Department of Medicine and Department of Molecular Oncology, King Faisal Specialist Hospital & Research Centre, Riyadh 11211, Saudi Arabia.
  • Vincenzo Lagani
    3 Gnosis Data Analysis PC, Heraklion, Greece.
  • David Gomez-Cabrero
    Departamento de Salud-Universidad Pública de Navarra, Translational Bioinformatics Unit, Navarra Biomed, Pamplona, Spain.
  • Xin Gao
    Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA.
  • Jasmeen S Merzaban
    Bioscience Program, King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955-6900, Saudi Arabia; KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia. Electronic address: jasmeen.merzaban@kaust.edu.sa.