Benchmarking missing-values approaches for predictive models on health databases.

Journal: GigaScience
Published Date:

Abstract

BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values. These large databases are well suited to train machine learning models, e.g., for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative-rather than generative-modeling and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics.

Authors

  • Alexandre Perez-Lebel
    McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada.
  • Gael Varoquaux
    Parietal, INRIA, NeuroSpin, bat 145 CEA Saclay, 91191, Gif sur Yvette, France.
  • Marine Le Morvan
    Inria Saclay - Île-de-France, Parietal team, 1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau, France.
  • Julie Josse
    Inria Montpellier, Bâtiment 5, 860 Rue de St-Priest, 34090 Montpellier, France.
  • Jean-Baptiste Poline
    McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada.