Improving the performance and interpretability on medical datasets using graphical ensemble feature selection.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: A major hindrance towards using Machine Learning (ML) on medical datasets is the discrepancy between a large number of variables and small sample sizes. While multiple feature selection techniques have been proposed to avoid the resulting overfitting, overall ensemble techniques offer the best selection robustness. Yet, current methods designed to combine different algorithms generally fail to leverage the dependencies identified by their components. Here, we propose Graphical Ensembling (GE), a graph-theory-based ensemble feature selection technique designed to improve the stability and relevance of the selected features.

Authors

  • Enzo Battistella
    U1030 Molecular Radiotherapy, Paris-Sud University - Gustave Roussy - Inserm - Paris-Saclay University, Villejuif, France; Department of Medical Physics, Gustave Roussy - Paris-Saclay University, Villejuif, France; MICS Laboratory, CentraleSupélec, Paris-Saclay University, Gif-sur-Yvette, France.
  • Dina Ghiassian
    Scipher Medicine, Waltham, MA 02453, United States.
  • Albert-László Barábasi
    Center for Complex Networks Research (CCNR) and Department of Physics, Northeastern University, 177 Huntington Avenue, 11th floor, Boston, Massachusetts 02115, USA.