Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier.

Authors

  • Brihat Sharma
    Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.
  • Dmitriy Dligach
    Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, IL.
  • Kristin Swope
    Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA.
  • Elizabeth Salisbury-Afshar
    Center for Multi-System Solutions to the Opioid Epidemic, American Institute for Research, Chicago, IL, USA.
  • Niranjan S Karnik
    Department of Psychiatry, Rush University Medical Center, Chicago, IL, USA.
  • Cara Joyce
    Loyola University Chicago, Chicago, IL.
  • Majid Afshar
    Loyola University Chicago, Chicago, IL.