Large Language Models Can be Good Medical Annotators: A Case Study of Drug Change Detection in Japanese EHRs.

Journal: Studies in health technology and informatics
Published Date:

Abstract

In this study, we combined automatically generated labels from large language models (LLMs) with a small number of manual annotations to classify adverse event-related treatment discontinuations in Japanese EHRs. By fine-tuning JMedRoBERTa and T5 on 6,156 LLM-labeled records and 200 manually labeled samples and then evaluating on a 100-record test set, T5 achieved a precision of 0.83, albeit with a recall of only 0.25. We noted that when training solely on the 200 human-labeled samples (that contained significantly few positive cases), the model failed to detect any adverse events, making a reliable measurement of precision or recall infeasible (that is, N/A). This underscores the potential of large-scale LLM-driven labeling as well as the need to improve recall and label quality in practical clinical scenarios.

Authors

  • Takeshi Ito
    Nara Institute of Science and Technology, Japan.
  • Tomohide Yoshie
    National Cerebral and Cardiovascular Center, Japan.
  • Sohei Yoshimura
    National Cerebral and Cardiovascular Center, Japan.
  • Nobuyuki Ohara
    Kobe City Medical Center General Hospital, Japan.
  • Shuntaro Yada
    Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
  • Shoko Wakamiya
    Nara Institute of Science and Technology (NAIST), Japan.
  • Eiji Aramaki
    Nara Institute of Science and Technology (NAIST), Japan.