Scalable information extraction from free text electronic health records using large language models.

Journal: BMC medical research methodology
PMID:

Abstract

BACKGROUND: A vast amount of potentially useful information such as description of patient symptoms, family, and social history is recorded as free-text notes in electronic health records (EHRs) but is difficult to reliably extract at scale, limiting their utility in research. This study aims to assess whether an "out of the box" implementation of open-source large language models (LLMs) without any fine-tuning can accurately extract social determinants of health (SDoH) data from free-text clinical notes.

Authors

  • Bowen Gu
    Dana-Farber Cancer Institute, Boston, MA.
  • Vivian Shao
    Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030-R, Boston, MA, 02120, USA.
  • Ziqian Liao
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA.
  • Valentina Carducci
    Department of Otorhinolaryngology - Head & Neck Surgery, Mayo Clinic, Rochester, MN, USA.
  • Santiago Romero Brufau
    Department of Otorhinolaryngology - Head & Neck Surgery, Mayo Clinic, Rochester, MN, USA.
  • Jie Yang
    Key Laboratory of Development and Maternal and Child Diseases of Sichuan Province, Department of Pediatrics, Sichuan University, Chengdu, China.
  • Rishi J Desai
    Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.