Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

Journal: Bioinformatics (Oxford, England)

Published Date: Sep 15, 2017

Abstract

MOTIVATION: Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting.

Authors

Trang T Le

Department of Biostatistics, Epidemiology, and Informatics.
W Kyle Simmons

Laureate Institute for Brain Research, Tulsa, OK 74136, USA.
Masaya Misaki

Laureate Institute for Brain Research, Tulsa, OK 74136, USA.
Jerzy Bodurka

Laureate Institute for Brain Research, Tulsa, OK 74136, USA.
Bill C White

Tandy School of Computer Science, University of Tulsa, OK 74104, USA.
Jonathan Savitz

Laureate Institute for Brain Research, Tulsa, OK 74136, USA.
Brett A McKinney

Department of Mathematics, University of Tulsa, Tulsa, OK 74104, USA.

Keywords

Classification Computational Biology Depressive Disorder, Major Humans Machine Learning Models, Biological Privacy Software

External Resources

View on PubMed Access via DOI PubMed (28472232)

Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals