Domain Shift in Part-of-Speech Tagging.
Journal:
Studies in health technology and informatics
Published Date:
May 15, 2025
Abstract
This study highlights domain shift in dataset distributions that impact machine learning performance in clinical natural language processing, analyzing linguistic differences across clinical narratives, biomedical abstracts, and news articles in English using part-of-speech (POS) tag distributions. Results indicate significant variations in POS tag occurrences, with undefined tags more frequent in clinical datasets, emphasizing the need for specialized tools and improved domain adaptation techniques to address these challenges.