Reporting of demographic data and representativeness in machine learning models using electronic health records.

Journal: Journal of the American Medical Informatics Association : JAMIA
PMID:

Abstract

OBJECTIVE: The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility.

Authors

  • Selen Bozkurt
    Department of Biostatistics and Medical Informatics, Akdeniz University Faculty of Medinice, 48000 Antalya, Turkey.
  • Eli M Cahan
    Department of Medicine, Stanford University, Stanford, California, USA.
  • Martin G Seneviratne
    Department of Biomedical Informatics, Stanford School of Medicine, CA.
  • Ran Sun
    Department of Medicine, Stanford University, Stanford, California, USA.
  • Juan A Lossio-Ventura
    Department of Medicine, Stanford University, Stanford, California, USA.
  • John P A Ioannidis
    Stanford Prevention Research Center, Department of Medicine, Stanford University, Stanford, California.
  • Tina Hernandez-Boussard
    Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA.