Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.

Journal: Neurosurgical review
PMID:

Abstract

In the multidisciplinary treatment of cerebrovascular diseases, specialists from different disciplines strive to develop patient-specific treatment recommendations. ChatGPT is a natural language processing chatbot with increasing applicability in medical practice. This study evaluates ChatGPT's ability to provide treatment recommendations for patients with unruptured intracranial aneurysms (UIA). Anonymized patient data and radiological reports of 20 patients with UIAs were provided to GPT-4 in a standardized format and used to generate a treatment recommendation for different clinical scenarios. GPT-4 responses were evaluated by a multidisciplinary panel of specialists by means of the Likert scale and subsequently benchmarked against the Unruptured Intracranial Aneurysm Treatment Score (UIATS) as well as the actual treatment decision made by the multidisciplinary institutional neurovascular board (INVB). Agreement between expert raters was measured using linear weighted Fleiss-Kappa coefficient. GPT-4 analyzed individual pathological features of the radiological reports and formulated a corresponding assessment for each aspect. None of the recommendations generated reflected evidence of factual hallucination, although in 25% of the case studies no specific recommendation could be derived from the GPT-4 responses. The expert panel rated the overall quality of the GPT-4 recommendations with a median of 3.4 out of 5 points. The GPT-4 recommendations were congruent with those of the INBI in 65% of cases. Interrater reliability among experts showed moderate to low agreement in the assessment of AI-assisted decision making. GPT-4 appears to be able to process clinical information about UIAs and generate treatment recommendations. However, the level of ambiguity and the utilization of scientific evidence in the recommendations are not yet patient/case specific enough to substitute the decision-making of a multidisciplinary neurovascular board. A prospective evaluation of GPT-4 competence as a companion in decision-making panels is deemed necessary.

Authors

  • Alexis Hadjiathanasiou
    Department of Neurosurgery, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany. alexis.hadjiathanasiou@ukb.de.
  • Leonie Goelz
    Department of Radiology and Neuroradiology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Florian Muhn
    Department of Neurology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Rebecca Heinz
    Department of Neurosurgery, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Lutz Kreissl
    Department of Radiology and Neuroradiology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Paul Sparenberg
    Department of Neurology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Johannes Lemcke
    Department of Neurosurgery, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Ingo Schmehl
    Department of Neurology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Sven Mutze
    Department of Radiology and Neuroradiology, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.
  • Patrick Schuss
    Department of Neurosurgery, BG Klinikum Unfallkrankenhaus Berlin, Berlin, Germany.