A fast gene selection method for multi-cancer classification using multiple support vector data description.
Journal:
Journal of biomedical informatics
Published Date:
Dec 27, 2014
Abstract
For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. A robust feature selection algorithm is required to remove irrelevant genes and choose the informative ones. Support vector data description (SVDD) has been applied to gene selection for many years. However, SVDD cannot address the problems with multiple classes since it only considers the target class. In addition, it is time-consuming when applying SVDD to gene selection. This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data. A recursive feature elimination (RFE) scheme is introduced to iteratively remove irrelevant features, so the proposed method is called multiple SVDD-RFE (MSVDD-RFE). To make full use of all classes for a given task, MSVDD-RFE independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of MSVDD-RFE are validated by experiments on five publicly available microarray datasets. Our proposed method is faster and more effective than other methods.
Authors
Keywords
Algorithms
Artificial Intelligence
Bayes Theorem
Colonic Neoplasms
Diagnosis, Computer-Assisted
Gene Expression
Gene Expression Profiling
Gene Expression Regulation, Leukemic
Gene Expression Regulation, Neoplastic
Humans
Leukemia
Models, Statistical
Neoplasms
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Software
Support Vector Machine