Automatically disambiguating medical acronyms with ontology-aware deep learning.

Journal: Nature communications
Published Date:

Abstract

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; however, training them requires extensive hand-labelled datasets. Disambiguating abbreviations is important for automated clinical note processing; however, broad deployment of ML for this task is restricted by the scarcity and imbalance of labeled training data. In this work we present a method that improves a model's ability to generalize through novel data augmentation techniques that utilizes information from biomedical ontologies in the form of related medical concepts, as well as global context information within the medical note. We train our model on a public dataset (MIMIC III) and test its performance on automatically generated and hand-labelled datasets from different sources (MIMIC III, CASI, i2b2). Together, these techniques boost the accuracy of abbreviation disambiguation by up to 17% on hand-labeled data, without sacrificing performance on a held-out test set from MIMIC III.

Authors

  • Marta Skreta
    Department of Computer Science, University of Toronto, Toronto, Canada. martaskreta@cs.toronto.edu.
  • Aryan Arbabi
    Department of Computer Science, University of Toronto, Toronto, Canada.
  • Jixuan Wang
    School of Software, Harbin Institute of Technology, Harbin, China and.
  • Erik Drysdale
    The Hospital for Sick Children, Toronto, Canada.
  • Jacob Kelly
    Department of Computer Science, University of Toronto, Toronto, Canada.
  • Devin Singh
    Department of Computer Science, University of Toronto, Toronto, Canada.
  • Michael Brudno
    Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, ON, Canada.