Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges: A Randomized Controlled Trial

Journal: medRxiv

Published Date: Jun 2, 2026

Abstract

As large language models (LLMs) enter clinical workflows, automation bias, the uncritical acceptance of automated output, poses a patient-safety risk. Optimal physician-AI collaboration requires trust calibration, matching scrutiny to LLM recommendation accuracy. We report a randomized trial evaluating a behavioral nudge to mitigate automation bias. Seventy-two AI-trained physicians were randomized to evaluate six vignettes alongside ChatGPT-5.1 recommendations, consulted at each physician's discretion; three contained deliberate, clinically significant errors. The treatment arm received a dual-component nudge: an anchoring cue reporting ChatGPT's benchmark accuracy to calibrate expectations, and a case-specific, selective-attention cue; a numeric accuracy rating and color-coded traffic light, derived from the mean of three distinct-family LLMs. The control group saw recommendations alone; blinded reviewers scored diagnostic reasoning against an expert rubric. The treatment group scored significantly higher (mean difference, 7.6 percentage-points; 95% CI, 1.4-13.9; P=0.016) than the control, suggesting a scalable strategy to preserve clinical judgment in LLM-assisted care. ClinicalTrials.gov registration: NCT07328815.

Authors

Qazi
I. A.; Ali
A.; Khawaja
A. U.; Akhtar
M. J.; Sheikh
A. Z.; Alizai
M. H.

External Resources

View on medRxiv Access via DOI

Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges: A Randomized Controlled Trial

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges: A Randomized Controlled Trial

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals