Using recursive feature elimination in random forest to account for correlated variables in high dimensional data.

Journal: BMC genetics
PMID:

Abstract

BACKGROUND: Random forest (RF) is a machine-learning method that generally works well with high-dimensional problems and allows for nonlinear relationships between predictors; however, the presence of correlated predictors has been shown to impact its ability to identify strong predictors. The Random Forest-Recursive Feature Elimination algorithm (RF-RFE) mitigates this problem in smaller data sets, but this approach has not been tested in high-dimensional omics data sets.

Authors

  • Burcu F Darst
    Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut Street, 1007 WARF, Madison, WI, 53726, USA.
  • Kristen C Malecki
    Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut Street, 1007 WARF, Madison, WI, 53726, USA.
  • Corinne D Engelman
    Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut St. 1007 WARF, Madison, WI, 53726, USA.