Benchmarking missing-values approaches for predictive models on health databases.
Journal:
GigaScience
Published Date:
Apr 15, 2022
Abstract
BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values. These large databases are well suited to train machine learning models, e.g., for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative-rather than generative-modeling and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics.