German Medical NER with BERT and LLMs: The Impact of Training Data Size.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Named Entity Recognition (NER) in the medical domain often presents significant challenges due to the complexity and specificity of medical terminology, especially in lower-resource settings where annotated data is scarce. This study explores the performance of an exemplary large language model and a BERT-based model in the context of NER for German medical texts. We focus on the impact of different data sizes for training and their performance to simulate lower-resource conditions. Both models are evaluated on two German annotated corpora. Our results reveal that both models perform rather similar on both datasets, with LLaMA3.1 performing slightly better with less training material.

Authors

  • Suteera Seeha
    Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
  • Sihan Wu
    Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
  • Justin Hofenbitzer
    Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
  • Claudio Benzoni
    Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
  • Peter Pallaoro
    Institute for AI and Informatics in Medicine, School of Medicine and Health, Technical University of Munich, Munich, Germany.
  • Raphael Scheible
    Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
  • Martin Boeker
    Institute for Medical Biometry and Statistics, Medical Center - University of Freiburg, Faculty of Medicine, Stefan-Meier-Str. 26, Freiburg i. Br., 79104, Germany. martin.boeker@uniklinik-freiburg.de.
  • Luise Modersohn
    JULIE Lab, Friedrich Schiller University Jena, Germany.