Improving the performance and interpretability on medical datasets using graphical ensemble feature selection.
Journal:
Bioinformatics (Oxford, England)
Published Date:
Jun 3, 2024
Abstract
MOTIVATION: A major hindrance towards using Machine Learning (ML) on medical datasets is the discrepancy between a large number of variables and small sample sizes. While multiple feature selection techniques have been proposed to avoid the resulting overfitting, overall ensemble techniques offer the best selection robustness. Yet, current methods designed to combine different algorithms generally fail to leverage the dependencies identified by their components. Here, we propose Graphical Ensembling (GE), a graph-theory-based ensemble feature selection technique designed to improve the stability and relevance of the selected features.