Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: Feb 10, 2017

Abstract

Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.

Authors

Gregory P Finley

Institute for Health Informatics; Department of Surgery.
Serguei V S Pakhomov

Institute for Health Informatics; College of Pharmacy University of Minnesota, Minneapolis, MN.
Reed McEwan

Academic Health Center-Information Systems, Minneapolis, MN, USA.
Genevieve B Melton

Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.

Keywords

Abbreviations as Topic Algorithms Bayes Theorem Electronic Health Records Logistic Models Natural Language Processing

External Resources

View on PubMed PubMed (28269852)

Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals