Generative Artificial Intelligence Models in Clinical Infectious Disease Consultations: A Cross-Sectional Analysis Among Specialists and Resident Trainees.
Journal:
Healthcare (Basel, Switzerland)
Published Date:
Mar 27, 2025
Abstract
The potential of generative artificial intelligence (GenAI) to augment clinical consultation services in clinical microbiology and infectious diseases (ID) is being evaluated. This cross-sectional study evaluated the performance of four GenAI chatbots (GPT-4.0, a Custom Chatbot based on GPT-4.0, Gemini Pro, and Claude 2) by analysing 40 unique clinical scenarios. Six specialists and resident trainees from clinical microbiology or ID units conducted randomised, blinded evaluations across factual consistency, comprehensiveness, coherence, and medical harmfulness. Analysis showed that GPT-4.0 achieved significantly higher composite scores compared to Gemini Pro ( = 0.001) and Claude 2 ( = 0.006). GPT-4.0 outperformed Gemini Pro and Claude 2 in factual consistency (Gemini Pro, = 0.02; Claude 2, = 0.02), comprehensiveness (Gemini Pro, = 0.04; Claude 2, = 0.03), and the absence of medical harm (Gemini Pro, = 0.02; Claude 2, = 0.04). Within-group comparisons showed that specialists consistently awarded higher ratings than resident trainees across all assessed domains ( < 0.001) and overall composite scores ( < 0.001). Specialists were five times more likely to consider responses as "harmless". Overall, fewer than two-fifths of AI-generated responses were deemed "harmless". Post hoc analysis revealed that specialists may inadvertently disregard conflicting or inaccurate information in their assessments. Clinical experience and domain expertise of individual clinicians significantly shaped the interpretation of AI-generated responses. In our analysis, we have demonstrated disconcerting human vulnerabilities in safeguarding against potentially harmful outputs, which seemed to be most apparent among experienced specialists. At the current stage, none of the tested AI models should be considered safe for direct clinical deployment in the absence of human supervision.
Authors
Keywords
No keywords available for this article.