Leveraging pretrained language models for seizure frequency extraction from epilepsy evaluation reports.

Journal: NPJ digital medicine
Published Date:

Abstract

Seizure frequency is essential for evaluating epilepsy treatment, ensuring patient safety, and reducing risk for Sudden Unexpected Death in Epilepsy. As this information is often described in clinical narratives, this study presents an approach to extracting structured seizure frequency details from such unstructured text. We investigated two tasks: (1) extracting phrases describing seizure frequency, and (2) extracting seizure frequency attributes. For both tasks, we fine-tuned three BERT-based models (bert-large-cased, biobert-large-cased, and Bio_ClinicalBERT), as well as three generative large language models (GPT-4, GPT-3.5 Turbo, and Llama-2-70b-hf). The final structured output integrated the results from both tasks. GPT-4 attained the best performance across all tasks with precision, recall, and F1-score of 86.61%, 85.04%, and 85.79% respectively for frequency phrase extraction; 90.23%, 93.51%, and 91.84% for seizure frequency attribute extraction; and 86.64%, 85.06%, and 85.82% for the final structured output. These findings highlight the potential of fine-tuned generative models in extractive tasks from limited text strings.

Authors

  • Rashmie Abeysinghe
    Department of Neurology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Shiqiang Tao
    University of Texas Health Science Center at Houston, Houston, TX 77030.
  • Samden D Lhatoo
    University of Texas Health Science Center at Houston, Houston, TX 77030.
  • Guo-Qiang Zhang
    University of Texas Health Science Center at Houston, Houston, TX 77030.
  • Licong Cui
    The University of Texas Health Science Center at Houston, USA.

Keywords

No keywords available for this article.