Surrogate minimal depth as an importance measure for variables in random forests.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult.

Authors

  • Stephan Seifert
    Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.
  • Sven Gundlach
    Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.
  • Silke Szymczak
    Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.