Transformer-Based Multilabel NER Using Wikipedia Corpora in Multiple Languages.

Journal: Studies in health technology and informatics
Published Date:

Abstract

The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in German medication detection under sparse annotations compared to the baseline, with consistent performance across other languages.

Authors

  • Yelyzaveta Ahapova
    IT-Infrastructure for Translational Medical Research, University of Augsburg, Germany.
  • Johann Frei
  • Frank Kramer
    IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, Faculty of Medicine, University of Augsburg, Augsburg, Germany.