Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: May 22, 2025

Abstract

Annotated language resources are essential for supervised machine learning methods. In the clinical domain, such data sets can boost use-case specific natural language processing services. In this work, we have analyzed a clinical problem list table consisting of millions of ICD-10 codes assigned to short problem list descriptions in German. We have investigated whether the given data forms a valuable resource within a secondary use case scenario for coding support. Our proposed methodology exploits an embedding-based k-NN classifier, which was evaluated based on its coding performance, leveraging the multilingual BERT based language model SapBERT-UMLS in comparison with medBERT.de, which is specifically tailored to medical and clinical language resources in German. Our approach reached a weighted F1-measure of 0.87 using SapBERT-UMLS and an F1-measure of 0.86 for medBERT.de. The approach revealed promising coding results when reusing annotated language resources out of clinical routine documentation.

Authors

Markus Kreuzthaler

Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
Bastian Pfeifer

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
Stefan Schulz

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Keywords

Clinical Coding Humans International Classification of Diseases Machine Learning Natural Language Processing Unified Medical Language System

External Resources

View on PubMed PubMed (40417589)

Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals