Few-Shot Learning of Medical Coding Systems: A Case Study on Death Certificates with BERT and Mistral.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Identifying the Underlying Cause of Death accurately is crucial for effective healthcare policy and planning. The World Health Organization recommends using the ICD-10 system to standardize death certificate coding, a task often supported by semi-automated systems. This paper assesses the effectiveness of BERT and Mistral language models in automating this process, focusing particularly on their handling of varied instance densities per ICD code, ranging from 1 (simulating the introduction of a new code) to 100 (representing well-established codes). Through extensive comparative experiments, we find that the finetuned Mistral model substantially outperforms BERT, especially in scenarios with limited data. Mistral's higher effectiveness, even with limited data from less commonly used codes, highlight its potential to significantly enhance automated coding systems.

Authors

  • Mihai Horia Popescu
    University of Udine, Department of Mathematics, Computer Science and Physics.
  • Vincenzo Della Mea
    Department of Mathematics, Computer Science and Physics, University of Udine Italy.
  • Kevin Roitero
    University of Udine, Department of Mathematics, Computer Science and Physics.