Illustrating a framework for assessing generative artificial intelligence-based conversational agents for mental health.
Journal:
Psychiatry research
Published Date:
Dec 11, 2025
Abstract
BACKGROUND: Generative Artificial Intelligence (AI)-based conversational agents (CAs) hold promise for expanding mental health care access. AIMS: This study aimed to illustrate the implementation of the Thera Turing Test by comparing the quality of conversations delivered by a human therapist and a CA. METHODS: The study examined two conversations of Parent Management Training, Psychoeducation, and Special Time, delivered by an AI CA (Pat) utilizing written online-conversation. All conversations were evaluated by graduate psychology students using the Thera Turing Test framework. Treatment fidelity and common therapeutic factors were assessed. RESULTS: For both conversations (psychoeducation and special time), both the human therapist and Pat yielded high treatment fidelity (94.44% and 86% respectively). For common factors, the human therapist yielded a score of 90.80%, which was higher than all the Pat conversations. DISCUSSION: These preliminary findings indicate that when comparing treatment fidelity between sessions conducted by a human and sessions conducted by Pat, both met a high treatment fidelity at comparable levels. Regarding common factors, Pat received lower ratings than the human therapist, indicating that there is room for improvement. Overall, this report illustrates the benefits of using the Thera Turing Test to assess the quality of CAs and inform improvements needed.
Authors
Keywords
No keywords available for this article.