Cross-institution natural language processing for reliable clinical association studies: a methodological exploration.

Journal: Journal of clinical epidemiology
Published Date:

Abstract

OBJECTIVES: Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies.

Authors

  • Madhumita Sushil
    Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA.
  • Atul J Butte
    Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA.
  • Ewoud Schuit
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
  • Maarten van Smeden
    Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands.
  • Artuur M Leeuwenberg
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands. Electronic address: a.m.leeuwenberg-15@umcutrecht.nl.