Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features.

Journal: Bioinformatics (Oxford, England)

Published Date: Aug 1, 2023

Abstract

MOTIVATION: Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples.

Authors

Lucas F Voges

Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany.
Lukas C Jarren

Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany.
Stephan Seifert

Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.

Keywords

Bias Gene Frequency Machine Learning Random Forest

External Resources

View on PubMed Access via DOI PubMed (37522865)

Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals