Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

This study compared large language models (LLMs) and Bidirectional Encoder Representations from Transformers (BERT) models in identifying medication names, routes, and frequencies from publicly available free-text ophthalmology progress notes of 480 patients. 5,520 lines of annotated text were divided into train (N=3,864), validation (N=1,104), and test sets (N=552). We evaluated ChatGPT-3.5, ChatGPT-4, PaLM 2, and Gemini to identify these medication entities. We fine-tuned BERT, BioBERT, ClinicalBERT, DistilBERT, and RoBERTa for the same task using the training set. On the test set, GPT-4 achieved the best performance (macro-averaged F1 0.962). Among the BERT models, BioBERT achieved the best performance (macro-averaged F1 0.875). Modern LLMs outperformed BERT models even in the highly domain-specific task of identifying ophthalmic medication information from progress notes, showcasing the potential of LLMs for medical named entity recognition to enhance patient care.

Authors

  • Iyad Majid
    Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California.
  • Vaibhav Mishra
    Stanford University School of Medicine, Palo Alto, CA, United States.
  • Rohith Ravindranath
    Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California.
  • Sophia Y Wang
    School of Medicine, Stanford University, Palo Alto, CA, United States.