German Medical NER with BERT and LLMs: The Impact of Training Data Size.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

Named Entity Recognition (NER) in the medical domain often presents significant challenges due to the complexity and specificity of medical terminology, especially in lower-resource settings where annotated data is scarce. This study explores the performance of an exemplary large language model and a BERT-based model in the context of NER for German medical texts. We focus on the impact of different data sizes for training and their performance to simulate lower-resource conditions. Both models are evaluated on two German annotated corpora. Our results reveal that both models perform rather similar on both datasets, with LLaMA3.1 performing slightly better with less training material.

Authors

Suteera Seeha

Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
Sihan Wu

Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
Justin Hofenbitzer

Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
Claudio Benzoni

Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
Peter Pallaoro

Institute for AI and Informatics in Medicine, School of Medicine and Health, Technical University of Munich, Munich, Germany.
Raphael Scheible

Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
Martin Boeker

Institute for Medical Biometry and Statistics, Medical Center - University of Freiburg, Faculty of Medicine, Stefan-Meier-Str. 26, Freiburg i. Br., 79104, Germany. martin.boeker@uniklinik-freiburg.de.
Luise Modersohn

JULIE Lab, Friedrich Schiller University Jena, Germany.

Keywords

Germany Humans Natural Language Processing Terminology as Topic Vocabulary, Controlled

External Resources

View on PubMed Access via DOI PubMed (40380577)

German Medical NER with BERT and LLMs: The Impact of Training Data Size.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals