Thirteen Questions About Using Machine Learning in Causal Research (You Won't Believe the Answer to Number 10!).

Journal: American journal of epidemiology
PMID:

Abstract

Machine learning is gaining prominence in the health sciences, where much of its use has focused on data-driven prediction. However, machine learning can also be embedded within causal analyses, potentially reducing biases arising from model misspecification. Using a question-and-answer format, we provide an introduction and orientation for epidemiologists interested in using machine learning but concerned about potential bias or loss of rigor due to use of "black box" models. We conclude with sample software code that may lower the barrier to entry to using these techniques.

Authors

  • Stephen J Mooney
    Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: sjm2186@uw.edu.
  • Alexander P Keil
    Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, 20850, MD, USA.
  • Daniel J Westreich