Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples.

Authors

  • Lucas F Voges
    Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany.
  • Lukas C Jarren
    Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany.
  • Stephan Seifert
    Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.