Efficient Maintenance of Large-Scale Medical Dictionaries Using Large Language Models: A Case for Biomarkers.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
Dictionaries are essential in natural language processing and provide significant value across tasks; however, their construction and maintenance are expensive. Leveraging manual revision histories to suggest automatic corrections for unedited terms offers a promising solution to enhance quality while reducing costs. This study proposes a method for automatically correcting metadata in a large-scale medical dictionary containing more than 500,000 terms. By utilizing large language models that excel in zero-shot settings, the system estimates the dictionary information without task-specific configurations. This method was demonstrated through experiments on variations in gene biomarker expression, a task that requires specialized medical knowledge. The results indicate that this approach can significantly reduce the dictionary maintenance burden.