Adapting Large Language Models for Automated Summarisation of Electronic Medical Records in Clinical Coding.

Journal: Studies in health technology and informatics
PMID:

Abstract

Encapsulating a patient's clinical narrative into a condensed, informative summary is indispensable to clinical coding. The intricate nature of the clinical text makes the summarisation process challenging for clinical coders. Recent developments in large language models (LLMs) have shown promising performance in clinical text summarisation, particularly in radiology and echocardiographic reports, after adaptation to the clinical domain. To explore the summarisation potential of clinical domain adaptation of LLMs, a clinical text dataset, consisting of electronic medical records paired with "Brief Hospital Course" from the MIMIC-III database, was curated. Two open-source LLMs were then fine-tuned, one pre-trained on biomedical datasets and another on a general-content domain on the curated clinical dataset. The performance of the fine-tuned models against their base models were evaluated. The model pre-trained on biomedical data demonstrated superior performance after clinical domain adaptation. This finding highlights the potential benefits of adapting LLMs pre-trained on a related domain over a more generalised domain and suggests the possible role of clinically adapted LLMs as an assistive tool for clinical coders. Future work should explore adapting more advanced models to enhance model performance in higher-quality clinical datasets.

Authors

  • Bokang Bi
    Centre for Big Data Research in Health, The University of New South Wales, Sydney, Australia.
  • Leibo Liu
    Institute of Microelectronics, Tsinghua University, Beijing 100084, China. liulb@tsinghua.edu.cn.
  • Oscar Perez-Concha
    Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia.