Feasibility of Using ChatGPT to Generate Exposure Hierarchies for Treating Obsessive-Compulsive Disorder.

Journal: Behavior therapy
Published Date:

Abstract

Obsessive-compulsive disorder (OCD) is a chronic, severe condition. Although exposure and response prevention (ERP), the first-line treatment for OCD, is highly effective, too few clinicians are equipped to deliver it. One barrier is the time and expertise required to develop personalized exposure hierarchies. In this study, we examined the feasibility and promise of using large language models (LLMs) to generate appropriate exposure suggestions for OCD treatment. We used ChatGPT-4 (Generative Pretrained Transformer, Version 4) to generate 10-item exposure hierarchies for simulated patient cases that were systematically varied along the following dimensions: OCD subtype, symptom complexity or number, level of detail, patient age, and patient gender. Expert clinicians also generated hierarchies for a subset of prompts. ChatGPT-generated hierarchies were first rated for completeness and degree to which input information was incorporated. Three OCD experts blinded to the aims of the study then rated each ChatGPT- and expert-generated hierarchy's appropriateness, specificity, variability, safety/ethics, and overall usefulness or quality. ChatGPT generated partial (n = 15) or complete (n = 55) responses to 70 of 72 prompts and incorporated most input information (M = 4.29 out of 5, SD = 0.85). The only significant predictor of degree of input information incorporated was number of OCD symptoms; prompts with the most symptoms were rated as incorporating less input information than prompts with both low and moderate number of symptoms, ps < .05. Overall, ChatGPT-generated hierarchies were viewed as appropriate (M = 4.47, SD = 0.58), specific (M = 4.17, SD = 0.65), variable (M = 3.96, SD = 0.79), safe/ethical (M = 4.89, SD = 0.24), and useful (M = 3.99, SD = 0.82). However, expert human-generated hierarchies were still rated as significantly more appropriate, specific, variable, and useful, ps < .05, but not more or less safe and ethical than ChatGPT-generated hierarchies, p = .24. Only the level of symptom detail included in prompts was associated with ratings of ChatGPT-generated hierarchies, ps < .05, such that hierarchies were rated significantly better when prompts had been more detailed. Results suggest that LLMs such as ChatGPT hold great promise in helping generate effective OCD exposure hierarchies, while also highlighting key limitations that require resolution prior to clinical implementation. Given that few clinicians specialize in OCD treatment, it would be advantageous to establish how face-to-face or digital treatment can be augmented with this technology.

Authors

  • Emily E Bernstein
    Massachusetts General Hospital/Harvard Medical School.
  • Adam C Jaroszewski
    Department of Psychology, Harvard University.
  • Ryan J Jacoby
    Massachusetts General Hospital/Harvard Medical School.
  • Natasha H Bailen
    Massachusetts General Hospital/Harvard Medical School.
  • Jennifer Ragan
    Massachusetts General Hospital/Harvard Medical School.
  • Aisha Usmani
    Massachusetts General Hospital/Harvard Medical School.
  • Sabine Wilhelm
    Massachusetts General Hospital, Harvard Medical School, United States.