The Harms of Class Imbalance Corrections for Machine Learning Based Prediction Models: A Simulation Study.
Journal:
Statistics in medicine
PMID:
39865585
Abstract
INTRODUCTION: Risk prediction models are increasingly used in healthcare to aid in clinical decision-making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with the modeled outcome (i.e., individuals with vs. without the event of interest are not equally prevalentĀ in the data). It is common for researchers to correct for class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown.