Extracting Alcohol and Substance Abuse Status from Clinical Notes: The Added Value of Nursing Data.

Journal: Studies in health technology and informatics
Published Date:

Abstract

We applied an open source natural language processing (NLP) system "NimbleMiner" to identify clinical notes with mentions of alcohol and substance abuse. NimbleMiner allows users to rapidly discover clinical vocabularies (using word embedding model) and then implement machine learning for text classification. We used a large inpatient dataset with over 50,000 intensive care unit admissions (MIMIC II). Clinical notes included physician-written discharge summaries (n = 51,201) and nursing notes (n = 412,343). We first used physician-written discharge summaries to train the system's algorithm and then added nursing notes to the physician-written discharge summaries and evaluated algorithms prediction accuracy. Adding nursing notes to the physician-written discharge summaries resulted in almost two-fold vocabulary expansion. NimbleMiner slightly outperformed other state-of-the-art NLP systems (average F-score = .84), while requiring significantly less time for the algorithms development.: Our findings underline the importance of nursing data for the analysis of electronic patient records.

Authors

  • Maxim Topaz
    Division of General Internal Medicine and Primary Care, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Ludmila Murga
    Cheryl Spencer Department of Nursing, University of Haifa, Haifa, Israel.
  • Ofrit Bar-Bachar
    Cheryl Spencer Department of Nursing, University of Haifa, Haifa, Israel.
  • Kenrick Cato
    School of Nursing, Columbia University, New York City, NY, USA.
  • Sarah Collins
    Immunisation Department, Public Health England, London, UK.