Evaluation of Chunking and Embedding Strategies for Local Document Retrieval Using an Open-Source LLM in a Hospital.

Journal: Studies in health technology and informatics
Published Date:

Abstract

INTRODUCTION: The integration of Retrieval-Augmented Generation (RAG) into domain-specific systems enables context-aware and traceable information retrieval. This study explores chunking and embedding strategies for a RAG-based question-answering system tailored to administrative documents at University Hospital Halle, focusing on model selection, parameter tuning, and retrieval performance. The insights gained from this study should serve as the foundation for the future development of a Retrieval-Augmented Generation (RAG) based chatbot system that aims to facilitate access to document pool contents for hospital staff.

Authors

  • Jan Bossenz
    Junior Research Group (Bio-)Medical Data Science, Faculty of Medicine, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
  • Carlo Günzl
    Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany.
  • Fabian Berns
    medicalvalues GmbH, Karlsruhe, Germany.
  • Annemarie Weise
    Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany.
  • Christian Jäger
    Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany.
  • Jan Kirchhoff
    medicalvalues GmbH, Karlsruhe, Germany.
  • Jan Christoph
    Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen, Germany.
  • Christoph Demus
    Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany.