Instability of Variable-selection Algorithms Used to Identify True Predictors of an Outcome in Intermediate-dimension Epidemiologic Studies.

Journal: Epidemiology (Cambridge, Mass.)
Published Date:

Abstract

BACKGROUND: Machine-learning algorithms are increasingly used in epidemiology to identify true predictors of a health outcome when many potential predictors are measured. However, these algorithms can provide different outputs when repeatedly applied to the same dataset, which can compromise research reproducibility. We aimed to illustrate that commonly used algorithms are unstable and, using the example of Least Absolute Shrinkage and Selection Operator (LASSO), that stabilization method choice is crucial.

Authors

  • Solène Cadiou
    From the Team of Environmental Epidemiology, IAB, Institute for Advanced Biosciences, Inserm, CNRS, CHU-Grenoble-Alpes, University Grenoble-Alpes, Grenoble, France.
  • Rémy Slama