German Medical NER with BERT and LLMs: The Impact of Training Data Size.
Journal:
Studies in health technology and informatics
Published Date:
May 15, 2025
Abstract
Named Entity Recognition (NER) in the medical domain often presents significant challenges due to the complexity and specificity of medical terminology, especially in lower-resource settings where annotated data is scarce. This study explores the performance of an exemplary large language model and a BERT-based model in the context of NER for German medical texts. We focus on the impact of different data sizes for training and their performance to simulate lower-resource conditions. Both models are evaluated on two German annotated corpora. Our results reveal that both models perform rather similar on both datasets, with LLaMA3.1 performing slightly better with less training material.