Data Augmentation Techniques for Chinese Disease Name Normalization
Journal:
arXiv
Published Date:
Jan 2, 2025
Abstract
Disease name normalization is an important task in the medical domain. It
classifies disease names written in various formats into standardized names,
serving as a fundamental component in smart healthcare systems for various
disease-related functions. Nevertheless, the most significant obstacle to
existing disease name normalization systems is the severe shortage of training
data. Consequently, we present a novel data augmentation approach that includes
a series of data augmentation techniques and some supporting modules to help
mitigate the problem. Through extensive experimentation, we illustrate that our
proposed approach exhibits significant performance improvements across various
baseline models and training objectives, particularly in scenarios with limited
training data