Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.

Authors

  • Gregory P Finley
    Institute for Health Informatics; Department of Surgery.
  • Serguei V S Pakhomov
    Institute for Health Informatics; College of Pharmacy University of Minnesota, Minneapolis, MN.
  • Reed McEwan
    Academic Health Center-Information Systems, Minneapolis, MN, USA.
  • Genevieve B Melton
    Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.