Data-Driven Identification of Clinical Real-World Expressions Linked to ICD.

Journal: Studies in health technology and informatics
Published Date:

Abstract

A semi-structured clinical problem list containing ∼1.9 million de-identified entries linked to ICD-10 codes was used to identify closely related real-world expressions. A log-likelihood based co-occurrence analysis generated seed-terms, which were integrated as part of a k-NN search, by leveraging SapBERT for the generation of an embedding representation.

Authors

  • Amila Kugic
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
  • Bastian Pfeifer
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
  • Stefan Schulz
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
  • Markus Kreuzthaler
    Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.