Instability of Variable-selection Algorithms Used to Identify True Predictors of an Outcome in Intermediate-dimension Epidemiologic Studies.
Journal:
Epidemiology (Cambridge, Mass.)
Published Date:
May 1, 2021
Abstract
BACKGROUND: Machine-learning algorithms are increasingly used in epidemiology to identify true predictors of a health outcome when many potential predictors are measured. However, these algorithms can provide different outputs when repeatedly applied to the same dataset, which can compromise research reproducibility. We aimed to illustrate that commonly used algorithms are unstable and, using the example of Least Absolute Shrinkage and Selection Operator (LASSO), that stabilization method choice is crucial.