Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents.

Journal: Studies in health technology and informatics

PMID: 39176948

Abstract

In the field of medical data analysis, converting unstructured text documents into a structured format suitable for further use is a significant challenge. This study introduces an automated local deployed data privacy secure pipeline that uses open-source Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) architecture to convert medical German language documents with sensitive health-related information into a structured format. Testing on a proprietary dataset of 800 unstructured original medical reports demonstrated an accuracy of up to 90% in data extraction of the pipeline compared to data extracted manually by physicians and medical students. This highlights the pipeline's potential as a valuable tool for efficiently extracting relevant data from unstructured sources.

Authors

Yingding Wang

Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany.
Simon Leutner

Medical Technology and IT (MIT), University Hospital, LMU Munich, Munich, Germany.
Michael Ingrisch

Department of Radiology, Ludwig-Maximilians-University Munich, Munich, Germany.
Christoph Klein

Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany.
Ludwig Christian Hinske

Institute for Digital Medicine, University Hospital Augsburg, Augsburg, Germany.
Katharina Danhauser

Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany.

Keywords

Computer Security Data Mining Electronic Health Records Germany Humans Information Storage and Retrieval Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (39176948)

Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals