Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping.

Journal: BMC genomics
PMID:

Abstract

BACKGROUND: Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data.

Authors

  • Segun Jung
  • Yingtao Bi
  • Ramana V Davuluri