From Text to Knowledge: An End-To-End Extraction Pipeline for Clinical Information.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

This study explores the use of Large Language Models (LLMs) in extracting and structuring allergic reaction data from non-English clinical free texts. Leveraging open-source models such as Llama 3.1, Qwen 2.5, and Mistral NeMo, the study utilizes 500 anonymized German discharge letters from the University Hospital Schleswig-Holstein to test an end-to-end workflow. The presented approach extracts allergy information, maps it to SNOMED CT codes, and formats it into HL7 FHIR-compliant resources. Results demonstrate high accuracy in substance detection and encoding, although encoding errors occur due to complex string-matching problems. Reaction identification proves challenging, with mixed performance across models. The study highlights the potential for further improvements through model training on specific terminologies and enhanced prompt engineering. Overall, this proof-of-concept shows the promise of LLMs for varied, domain-specific healthcare tasks, enabling automation and scalability in handling complex clinical data.

Authors

Mário Macedo

Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany.
Joshua Wiedekopf

Institute of Medical Biometry and Statistics, Section for Clinical Research-IT, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany.
Tobias Hillmer

Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany.
Björn Schreiweis

Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel and Lübeck, Schleswig-Holstein, Germany.
Sylvia Saalfeld

Department of Simulation and Graphics, Otto von Guericke University Magdeburg, Germany; Research Campus STIMULATE, Otto von Guericke University Magdeburg, Germany.
Hannes Ulrich

IT for Clinical Research, Lübeck (ITCR-L), University of Lübeck, Germany.

Keywords

Data Mining Electronic Health Records Germany Humans Natural Language Processing Systematized Nomenclature of Medicine

External Resources

View on PubMed Access via DOI PubMed (40776022)

From Text to Knowledge: An End-To-End Extraction Pipeline for Clinical Information.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

From Text to Knowledge: An End-To-End Extraction Pipeline for Clinical Information.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals