The feature selection bias problem in relation to high-dimensional gene data.

Journal: Artificial intelligence in medicine

Published Date: Nov 14, 2015

Abstract

OBJECTIVE: Feature selection is a technique widely used in data mining. The aim is to select the best subset of features relevant to the problem being considered. In this paper, we consider feature selection for the classification of gene datasets. Gene data is usually composed of just a few dozen objects described by thousands of features. For this kind of data, it is easy to find a model that fits the learning data. However, it is not easy to find one that will simultaneously evaluate new data equally well as learning data. This overfitting issue is well known as regards classification and regression, but it also applies to feature selection.

Authors

Jerzy Krawczuk

Faculty of Computer Science, Bialystok University of Technology, 45A Wiejska St., 15-351 Bialystok, Poland.
Tomasz Łukaszuk

Faculty of Computer Science, Bialystok University of Technology, 45A Wiejska St., 15-351 Bialystok, Poland. Electronic address: t.lukaszuk@pb.edu.pl.

Keywords

Algorithms Bias Biomarkers, Tumor Computational Biology Data Mining Databases, Genetic Decision Support Techniques Gene Expression Profiling Gene Expression Regulation, Neoplastic Humans Linear Models Oligonucleotide Array Sequence Analysis Pattern Recognition, Automated Reproducibility of Results Support Vector Machine

External Resources

View on PubMed Access via DOI PubMed (26674595)

The feature selection bias problem in relation to high-dimensional gene data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals