Suicide- and crisis-risk detection using large language models in mental-health chatbots
Journal:
medRxiv
Published Date:
Jan 15, 2026
Abstract
ObjectiveLarge language models (LLMs) are increasingly embedded in mental-health chatbots, yet safe deployment is limited by two unresolved challenges: (1) suicide- and crisis-risk detection lacks a definitive ground truth and is characterized by substantial clinician disagreement, and (2) most evaluations frame risk detection as an offline accuracy task rather than a real-time safety problem. This study aimed to empirically characterize these limitations and to derive design principles for uncertainty-aware, safety-oriented crisis detection in conversational artificial intelligence.
Methods and AnalysisWe curated a clinician-labeled dataset of 200 real-world conversation segments drawn from a deployed mental-health chatbot. Five clinical experts independently annotated each segment for suicide- and crisis-related risk. Using a single base LLM, we implemented five prompt-defined detection variants with systematically increasing sensitivity thresholds, without task-specific training or fine-tuning. Models were evaluated against clinician consensus labels to quantify false-negative and false-positive trade-offs. Latency analyses assessed feasibility for real-time, per-turn monitoring.
ResultsAs sensitivity increased, the false-negative rate decreased monotonically from 87% to 0%, while false-positive rates rose accordingly. High- and extreme-sensitivity variants achieved near-perfect (98.9%) and perfect (100%) recall, demonstrating that near-zero-miss crisis detection from natural language is technically feasible in real time (mean latency <1 s). Importantly, model errors aligned closely with cases of clinician disagreement, indicating that misclassifications predominantly reflect irreducible uncertainty rather than model failure.
ConclusionSuicide- and crisis-risk detection in conversational systems is inherently uncertain and should be reframed from an accuracy-oriented classification task toward an online, safety-oriented monitoring problem. Within this framing, near-zero-miss detection is achievable but necessarily incurs elevated false-positives, motivating architectural rather than purely model-level solutions. We propose an operational emergency mode in which conservative risk detection operates independently from the conversational model, allowing supportive engagement to be maintained under heightened safety constraints. This layered, uncertainty-aware architecture provides a practical pathway for safer deployment of LLM-based mental-health chatbots without reliance on large training datasets or extensive model optimization.
What is already known on this topic?O_LIMental-health chatbots based on large language models are increasingly used for psychological support, but evaluations and real-world incidents show that general-purpose LLMs are unreliable in recognizing suicide- and crisis-related risk and may respond unsafely in high-stakes situations.
C_LIO_LISuicide and crisis risk assessment lacks a stable ground truth, with substantial inter-clinician disagreement even among trained experts, indicating that automated risk detection is inherently uncertain and cannot be treated as a conventional supervised classification task.
C_LI
What this study addsO_LIDemonstrates that near-zero-miss suicide- and crisis-risk detection is technically feasible using clinician-validated data and prompt-based sensitivity calibration, without the further need for fine-tuning or large task-specific training datasets.
C_LIO_LIShows that detection errors are largely driven by irreducible uncertainty rather than model failure, as misclassifications systematically align with areas of clinician disagreement, supporting the view that crisis detection is an online monitoring problem rather than a solvable classification task.
C_LIO_LIIntroduces an architectural safety framework for mental-health chatbots in which conservative, independent risk detection enables an operational emergency mode that prioritizes safety while maintaining empathic engagement.
C_LI
How this study might afect research, practice or policyO_LIThis work reframes suicide- and crisis-risk detection in conversational AI as a safety-oriented, uncertainty-aware problem rather than an accuracy-driven prediction task, challenging prevailing evaluation practices.
C_LIO_LIBy proposing an architectural separation between risk detection and dialogue generation, it provides a practical, scalable framework for deploying mental-health chatbots that support just-in-time safety interventions without relying on extensive clinical training data.
C_LI