The Cognitive Safety Net: Comparing Human and AI Diagnostic Reasoning during Complex Clinical Situations
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Diagnostic error in high-stakes clinical environments remains a significant cause of preventable harm. While a new generation of customisable digital cognitive aids (cDCAs) has shown a capacity to improve performance, achieve robust competence, and double learning retention, the potential for artificial intelligence (AI) to augment the foundational, anticipatory reasoning that precedes action is not well understood. This study aims to compare the diagnostic reasoning strategies of experienced anaesthesiology residents with those of a large language model (LLM) during a simulated, complex and realistic anaesthesiology scenario. We conducted a comparative analysis within a high-fidelity simulation randomised controlled trial (Anticipamax, NCT06487208). Thirty-four experienced anaesthesiology residents and a conversational LLM (ChatGPT-4) managed a perioperative shock of deliberately multifactorial aetiology. Diagnostic lotteries—sets of hypotheses with assigned plausibility scores—were collected before and after the simulation. We implemented a novel analytical framework based on the social choice Condorcet method, to rank not only individual hypotheses but also to compare the complete diagnostic strategies as the case evolved. The AI and residents demonstrated distinct reasoning profiles. Initially, the AI produced an exhaustive, non-hierarchical analysis, correctly identifying septic shock among its top, similarly-scored hypotheses. Residents, in contrast, employed a pragmatic, focused strategy, prioritising immediate surgical risks and unanimously identifying an experience-based risk (gas embolism) that the AI systematically overlooked, and consistently reserved a portion of their reasoning for uncertainty, termed ‘Place for Doubt’. After the clinical evolution, both converged on septic shock. A ‘complex scrutiny’ analysis of the overall strategies revealed that the residents’ focused and adaptive reasoning was consistently ranked as strategically superior to the AI’s exhaustive but diluted approach. AI demonstrates a powerful capacity for broad diagnostic anticipation, acting as a potential safeguard against premature diagnostic closure. Experienced residents exhibit a strategically superior reasoning process in its focus and adaptation. Our findings support a powerful synergy where the AI serves as a ‘Cognitive Safety Net’ to augment, not replace, the contextualised judgment of the human practitioner. Human error in healthcare is a global prominent cause of death. ‘Traditional’ cognitive support tools (e.g., paper checklists) have been shown to improve technical skills during medical crises, but their impact on non-technical skills is limited and their clinical adoption remains low. A new generation of customisable digital cognitive aids (cDCAs) can significantly improve both technical and non-technical performance, fostering better team management and crisis resolution. Information on how clinicians deliver the best anticipatory clinical reasoning is scarce. Recent work comparing machine-learning models to clinicians in trauma triage found comparable accuracy but only moderate agreement, suggesting a collaborative paradigm and motivating deeper analyses of the reasoning process itself. However, a critical gap remains in understanding the underlying nature of the diagnostic reasoning strategies that lead to these outcomes. The ‘how’ of human and AI reasoning, especially in dynamic, anticipatory clinical tasks, is not well understood. This is the first study to directly compare in action the diagnostic reasoning strategies of clinicians and a large language model (AI). It introduces a novel analytical framework based on the Condorcet social choice method to move beyond simple performance scores and rigorously model and rank the overall quality of diagnostic strategies in a simulated daily complex situation. The findings support a model of human-AI complementarity, where the AI excels at broad, exhaustive analysis, while clinicians demonstrate a superior, focused, and adaptive strategic reasoning, suggesting the human’s role as a meta-cognitive supervisor of AI-driven exhaustive but ‘diluted’ insights.