A Chatbot for the Management of Bipolar Disorder: Using Retrieval-Augmented Generation with an Open-Weight Large Language Model to Answer Clinical Questions Based on the CANMAT and ISBD 2018 Guidelines

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Clinical practice guidelines support evidence-based care but are often underused due to complexity, time constraints, and navigation challenges. We investigated whether a conversational agent (chatbot) using an open-weight large language model (LLM) with retrieval-augmented generation (RAG) could provide guideline-consistent answers for bipolar disorder management based on the full 2018 CANMAT and ISBD guidelines, comparing against a system using only the base LLM. We developed a multi-step RAG-based chatbot that retrieves relevant guideline sections and generates responses using Llama 3.3 70B. Twenty-one clinical vignettes spanning all guideline sections were created. Six expert psychiatrists generated queries and were presented with paired responses without labels from two systems: one using the base Llama 3.3 70B model, the other RAG-enhanced. Responses rated guideline consistency on a three-point scale, and were analyzed using mixed-effects ordinal logistic regression. Experts evaluated 126 responses, of which 110 (87.3%) were rated as more or as correct as the baseline system. The RAG system produced 80 answers (63.5%) rated fully consistent with the guidelines versus 24 (19.0%) for baseline, and only 10 answers with major deviation (7.9%) versus 48 (38.1%) for baseline. Ordinal regression showed RAG responses were significantly more likely to be more correct (OR = 9.1, 95% CI 5.3–16.3, p < 0.001), which was consistent across all raters. Preference ratings favored RAG answers in 78.7% of cases. Performance varied by vignette, with some errors in both retrieval and reasoning. The use of RAG with an open-weight model helped produce answers consistent with the CANMAT guidelines across vignettes that required adapting or combining guideline text, suggesting viability of a bipolar guideline chatbot. We identified areas to improve results and evaluation. Future work should explore additional retrieval strategies and LLMs, and test in more naturalistic settings.

Authors

Yash Mali; Zejiao Zeng; Kayoung Heo; Grace Zhang; Jincheng Chen; Kamyar Keramatian; Gayatri Saraf; Marco Solmi; Edwin Tam; Sagar V. Parikh; Ayal Schaffer; Serge Beaulieu; Raymond Ng; Lakshmi N. Yatham; John-Jose Nunez

External Resources

View on medRxiv Access via DOI

A Chatbot for the Management of Bipolar Disorder: Using Retrieval-Augmented Generation with an Open-Weight Large Language Model to Answer Clinical Questions Based on the CANMAT and ISBD 2018 Guidelines

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

A Chatbot for the Management of Bipolar Disorder: Using Retrieval-Augmented Generation with an Open-Weight Large Language Model to Answer Clinical Questions Based on the CANMAT and ISBD 2018 Guidelines

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals