Assessing bias in AI-driven psychiatric recommendations: A comparative cross-sectional study of chatbot-classified and CANMAT 2023 guideline for adjunctive therapy in difficult-to-treat depression.
Journal:
Psychiatry research
PMID:
40267866
Abstract
The integration of chatbots into psychiatry introduces a novel approach to support clinical decision-making, but biases in their recommendations pose significant concerns. This study investigates potential biases in chatbot-generated recommendations for adjunctive therapy in difficult-to-treat depression, comparing these outputs with the Canadian Network for Mood and Anxiety Treatments (CANMAT) 2023 guidelines. The analysis involved calculating Cohen's kappa coefficients to measure the overall level of agreement between chatbot-generated classifications and CANMAT guidelines. Differences between chatbot-generated and CANMAT classifications for each medication were assessed using the Wilcoxon signed-rank test. Results reveal substantial agreement for high-performing models, such as Google AI's Gemini 2.0 Flash, which achieved the highest Cohen's kappa value of 0.82 (SE = 0.052). In contrast, OpenAI's o1 model showed a lower agreement of 0.746 (SE = 0.057). Notable discrepancies were observed in the overestimation of medications such as quetiapine and lithium and the underestimation of modafinil and ketamine. Additionally, a distinct bias pattern was observed in OpenAI's chatbots, which demonstrated a tendency to over-recommend lithium and bupropion. Our study highlights both the promise and the challenges of employing AI tools in psychiatric practice, and advocates for multi-model approaches to mitigate bias and improve clinical reliability.