Automated Seizure Classification Using Multimodal Large Language Models

Journal: medRxiv
Published Date:

Abstract

Accurately distinguishing between epileptic seizures (ES) and nonepileptic seizures (NES) is a significant clinical challenge that typically requires resource-intensive inpatient video-EEG monitoring. Here, we developed a novel Multimodal Large Language Models (MLLMs)-based method for automated extraction of semiological features from videos of seizure events, and subsequently, classified the events as ES or NES. 90 videos of ES and NES events from 29 patients were obtained from an epilepsy monitoring unit at a large academic hospital. Events were labeled as ES or NES based on expert evaluation of video-EEG recordings and simultaneously annotated with 24 clinically relevant semiological features. We implemented a MLLMs framework that integrates open-source vision-language models (VLMs) and audio-language models (ALMs) to analyze the videos and associated audio tracks and automatically extract these 24 features. The performance of the MLLMs-based feature extraction was evaluated against expert annotations. These features were subsequently used to train several classifiers including K-Nearest Neighbors (KNN), XGBoost, and Deep Factorization Machine, to differentiate ES from NES. Model performance was evaluated using leave-one-patient-out (LOPO) cross-validation. Using KNN, expert-annotated semiological features achieved precision 0.97, recall 0.97, F1-score 0.97, and AUC 0.99, establishing an upper bound on ES/NES classification performance. The MLLMs pipeline achieved an overall mean recall of 0.71, mean accuracy of 0.58, and a mean F1-score of 0.51 for semiological feature extraction compared to expert annotations. The best performing KNN model (k=7) using MLLMs-extracted features achieved a precision of 0.77, recall of 0.76, F1-score of 0.76, and AUC of 0.76 in classifying ES versus NES; correctly identifying 68 out of 90 events. We demonstrate the feasibility of using MLLMs to automatically extract clinically relevant semiological features from seizure videos and classify ES versus NES. MLLMs-based feature extraction and classification offer a promising clinically interpretable approach to aid diagnosis of epilepsy using videos.

Authors

  • Lina Zhang; Richard Jiang; Tonmoy Monsoor; Jessica N. Pasqua; Colin M. McCrimmon; Prateik Sinha; Kartik Sharma; Muayad Alzuabi; Victor Morales; Hailey M. Miranda; Chaya Manjeshwar; Vwani Roychowdhury; Rajarshi Mazumder

Categories