A deep database of medical abbreviations and acronyms for natural language processing.

Journal: Scientific data
Published Date:

Abstract

The recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6-14% increase in abbreviation coverage; 28-52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations .

Authors

  • Lisa Grossman Liu
    Department of Biomedical Informatics, Columbia University, New York, NY, USA. lvg2104@cumc.columbia.edu.
  • Raymond H Grossman
    Kensho Technologies, LLC, Cambridge, MA, USA.
  • Elliot G Mitchell
    Department of Biomedical Informatics, Columbia University.
  • Chunhua Weng
    Department of Biomedical Informatics, Columbia University.
  • Karthik Natarajan
    Department of Biomedical Informatics, Columbia University, New York, NY, USA.
  • George Hripcsak
    Department of Biomedical Informatics, Columbia University, 622 W 168th Street, PH20, New York, NY 10032, USA; Medical Informatics Services, NewYork-Presbyterian Hospital, 622 W 168th Street, PH20, New York, NY 10032, USA. Electronic address: hripcsak@columbia.edu.
  • David K Vawdrey
    Value Institute NewYork-Presbyterian Hospital, New York, NY, USA.