fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.

Journal: BMC bioinformatics

Published Date: Jun 13, 2019

Abstract

BACKGROUND: Parametric feature selection methods for machine learning and association studies based on genetic data are not robust with respect to outliers or influential observations. While rank-based, distribution-free statistics offer a robust alternative to parametric methods, their practical utility can be limited, as they demand significant computational resources when analyzing high-dimensional data. For genetic studies that seek to identify variants, the hypothesis is constrained, since it is typically assumed that the effect of the genotype on the phenotype is monotone (e.g., an additive genetic effect). Similarly, predictors for machine learning applications may have natural ordering constraints. Cross-validation for feature selection in these high-dimensional contexts necessitates highly efficient computational algorithms for the robust evaluation of many features.

Authors

Jiaxing Lin

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
Alexander Sibley

Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA.
Ivo Shterev

Duke Human Vaccine Institute, Duke University Medical Center, Durham, NC, USA.
Andrew Nixon

Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA.
Federico Innocenti

Division of Pharmacotherapy and Experimental Therapeutics, Chapel Hill, NC, USA.
Cliburn Chan

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
Kouros Owzar

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA. Kouros.Owzar@duke.edu.

Keywords

Algorithms Blood Proteins Computer Simulation Genome-Wide Association Study Machine Learning Polymorphism, Single Nucleotide Quantitative Trait, Heritable

External Resources

View on PubMed Access via DOI PubMed (31195980)

fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals