A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents
Journal:
arXiv
Published Date:
May 21, 2025
Abstract
The proliferation of Large Language Models (LLMs) and Intelligent Virtual
Agents acting as psychotherapists presents significant opportunities for
expanding mental healthcare access. However, their deployment has also been
linked to serious adverse outcomes, including user harm and suicide,
facilitated by a lack of standardized evaluation methodologies capable of
capturing the nuanced risks of therapeutic interaction. Current evaluation
techniques lack the sensitivity to detect subtle changes in patient cognition
and behavior during therapy sessions that may lead to subsequent
decompensation. We introduce a novel risk taxonomy specifically designed for
the systematic evaluation of conversational AI psychotherapists. Developed
through an iterative process including review of the psychotherapy risk
literature, qualitative interviews with clinical and legal experts, and
alignment with established clinical criteria (e.g., DSM-5) and existing
assessment tools (e.g., NEQ, UE-ATR), the taxonomy aims to provide a structured
approach to identifying and assessing user/patient harms. We provide a
high-level overview of this taxonomy, detailing its grounding, and discuss
potential use cases. We discuss two use cases in detail: monitoring cognitive
model-based risk factors during a counseling conversation to detect unsafe
deviations, in both human-AI counseling sessions and in automated benchmarking
of AI psychotherapists with simulated patients. The proposed taxonomy offers a
foundational step towards establishing safer and more responsible innovation in
the domain of AI-driven mental health support.