Few-Shot Learning of Medical Coding Systems: A Case Study on Death Certificates with BERT and Mistral.
Journal:
Studies in health technology and informatics
Published Date:
May 15, 2025
Abstract
Identifying the Underlying Cause of Death accurately is crucial for effective healthcare policy and planning. The World Health Organization recommends using the ICD-10 system to standardize death certificate coding, a task often supported by semi-automated systems. This paper assesses the effectiveness of BERT and Mistral language models in automating this process, focusing particularly on their handling of varied instance densities per ICD code, ranging from 1 (simulating the introduction of a new code) to 100 (representing well-established codes). Through extensive comparative experiments, we find that the finetuned Mistral model substantially outperforms BERT, especially in scenarios with limited data. Mistral's higher effectiveness, even with limited data from less commonly used codes, highlight its potential to significantly enhance automated coding systems.