Domain Shift in Part-of-Speech Tagging.

Journal: Studies in health technology and informatics
Published Date:

Abstract

This study highlights domain shift in dataset distributions that impact machine learning performance in clinical natural language processing, analyzing linguistic differences across clinical narratives, biomedical abstracts, and news articles in English using part-of-speech (POS) tag distributions. Results indicate significant variations in POS tag occurrences, with undefined tags more frequent in clinical datasets, emphasizing the need for specialized tools and improved domain adaptation techniques to address these challenges.

Authors

  • Amila Kugic
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
  • Stefan Schulz
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
  • Markus Kreuzthaler
    Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.