The feature selection bias problem in relation to high-dimensional gene data.
Journal:
Artificial intelligence in medicine
Published Date:
Nov 14, 2015
Abstract
OBJECTIVE: Feature selection is a technique widely used in data mining. The aim is to select the best subset of features relevant to the problem being considered. In this paper, we consider feature selection for the classification of gene datasets. Gene data is usually composed of just a few dozen objects described by thousands of features. For this kind of data, it is easy to find a model that fits the learning data. However, it is not easy to find one that will simultaneously evaluate new data equally well as learning data. This overfitting issue is well known as regards classification and regression, but it also applies to feature selection.
Authors
Keywords
Algorithms
Bias
Biomarkers, Tumor
Computational Biology
Data Mining
Databases, Genetic
Decision Support Techniques
Gene Expression Profiling
Gene Expression Regulation, Neoplastic
Humans
Linear Models
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Reproducibility of Results
Support Vector Machine