AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Large language models (LLMs) like GPT-4 show potential for scaling
motivational interviewing (MI) in addiction care, but require systematic
evaluation of therapeutic capabilities. We present a computational framework
assessing user-perceived quality (UPQ) through expected and unexpected MI
behaviors. Analyzing human therapist and GPT-4 MI sessions via human-AI
collaboration, we developed predictive models integrating deep learning and
explainable AI to identify 17 MI-consistent (MICO) and MI-inconsistent (MIIN)
behavioral metrics. A customized chain-of-thought prompt improved GPT-4's MI
performance, reducing inappropriate advice while enhancing reflections and
empathy. Although GPT-4 remained marginally inferior to therapists overall, it
demonstrated superior advice management capabilities. The model achieved
measurable quality improvements through prompt engineering, yet showed
limitations in addressing complex emotional nuances. This framework establishes
a pathway for optimizing LLM-based therapeutic tools through targeted
behavioral metric analysis and human-AI co-evaluation. Findings highlight both
the scalability potential and current constraints of LLMs in clinical
communication applications.