Transformer-Based Multilabel NER Using Wikipedia Corpora in Multiple Languages.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in German medication detection under sparse annotations compared to the baseline, with consistent performance across other languages.

Authors

Yelyzaveta Ahapova

IT-Infrastructure for Translational Medical Research, University of Augsburg, Germany.
Johann Frei
Frank Kramer

IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, Faculty of Medicine, University of Augsburg, Augsburg, Germany.

Keywords

Data Mining Encyclopedias as Topic Germany Internet Language Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40380596)

Transformer-Based Multilabel NER Using Wikipedia Corpora in Multiple Languages.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals