From Text to Knowledge: An End-To-End Extraction Pipeline for Clinical Information.

Journal: Studies in health technology and informatics
Published Date:

Abstract

This study explores the use of Large Language Models (LLMs) in extracting and structuring allergic reaction data from non-English clinical free texts. Leveraging open-source models such as Llama 3.1, Qwen 2.5, and Mistral NeMo, the study utilizes 500 anonymized German discharge letters from the University Hospital Schleswig-Holstein to test an end-to-end workflow. The presented approach extracts allergy information, maps it to SNOMED CT codes, and formats it into HL7 FHIR-compliant resources. Results demonstrate high accuracy in substance detection and encoding, although encoding errors occur due to complex string-matching problems. Reaction identification proves challenging, with mixed performance across models. The study highlights the potential for further improvements through model training on specific terminologies and enhanced prompt engineering. Overall, this proof-of-concept shows the promise of LLMs for varied, domain-specific healthcare tasks, enabling automation and scalability in handling complex clinical data.

Authors

  • Mário Macedo
    Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany.
  • Joshua Wiedekopf
    Institute of Medical Biometry and Statistics, Section for Clinical Research-IT, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany.
  • Tobias Hillmer
    Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany.
  • Björn Schreiweis
    Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel and Lübeck, Schleswig-Holstein, Germany.
  • Sylvia Saalfeld
    Department of Simulation and Graphics, Otto von Guericke University Magdeburg, Germany; Research Campus STIMULATE, Otto von Guericke University Magdeburg, Germany.
  • Hannes Ulrich
    IT for Clinical Research, Lübeck (ITCR-L), University of Lübeck, Germany.