From Text to Knowledge: An End-To-End Extraction Pipeline for Clinical Information.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
This study explores the use of Large Language Models (LLMs) in extracting and structuring allergic reaction data from non-English clinical free texts. Leveraging open-source models such as Llama 3.1, Qwen 2.5, and Mistral NeMo, the study utilizes 500 anonymized German discharge letters from the University Hospital Schleswig-Holstein to test an end-to-end workflow. The presented approach extracts allergy information, maps it to SNOMED CT codes, and formats it into HL7 FHIR-compliant resources. Results demonstrate high accuracy in substance detection and encoding, although encoding errors occur due to complex string-matching problems. Reaction identification proves challenging, with mixed performance across models. The study highlights the potential for further improvements through model training on specific terminologies and enhanced prompt engineering. Overall, this proof-of-concept shows the promise of LLMs for varied, domain-specific healthcare tasks, enabling automation and scalability in handling complex clinical data.