The Harms of Class Imbalance Corrections for Machine Learning Based Prediction Models: A Simulation Study.

Journal: Statistics in medicine
PMID:

Abstract

INTRODUCTION: Risk prediction models are increasingly used in healthcare to aid in clinical decision-making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with the modeled outcome (i.e., individuals with vs. without the event of interest are not equally prevalentĀ in the data). It is common for researchers to correct for class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown.

Authors

  • Alex Carriero
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
  • Kim Luijken
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
  • Anne de Hond
    Department of Digital Health, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands.
  • Karel G M Moons
    Julius Center for Health Sciences and Primary Care, and Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands.
  • Ben Van Calster
  • Maarten van Smeden
    Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands.