Assessment of Generative AI Chatbots for Neonatal Diagnostic Reasoning: A Comparative Concordance and Diagnosis Acceptability Analysis.

Journal: Indian journal of pediatrics
Published Date:

Abstract

OBJECTIVES: To assess the concordance and diagnosis acceptability of differential diagnoses (DDs) generated by artificial intelligence (AI) chatbots compared to expert neonatology physicians. METHODS: Thirteen clinical cases from neonatology were developed, each with five expert-defined differential diagnoses. Seven AI chatbots (ChatGPT, Claude, Copilot, Deepseek, Gemini, Grok, Perplexity) were queried to provide their most probable diagnosis along with five DDs per case. Four neonatologists evaluated the chatbot responses. Concordance scores were assigned based on the ranking of chatbot diagnoses against expert-defined diagnoses. Diagnosis acceptability scores were based on the number of DDs deemed clinically acceptable. RESULTS: There were no statistically significant differences in the concordance (P = 0.205) or diagnosis acceptability (P = 0.297) scores across the seven chatbots. Chatbots had statistically significant discrepancies when compared to hypothetical values of 5, 4.5, and 4, with none of the chatbots achieving the desired concordance score of 80%. However, in terms of diagnosis acceptability, ChatGPT, Claude, and Grok had scores similar to 90% diagnosis acceptability, while Copilot, Deepseek, Gemini, and Perplexity had 80% diagnosis acceptability. CONCLUSIONS: There were no significant differences in concordance and diagnosis acceptability scores across the seven chatbots. While none achieved the desired concordance of 80%, several models showed near 80-90% diagnosis acceptability. Hence, integration of chatbots should be done with caution.

Authors

  • Gaurav Gupta
    Department of Neurosurgery, Rutgers New Jersey Medical School, Newark, New Jersey.
  • Somashekhar Nimbalkar
    Department of Neonatology, Pramukhswami Medical College, Bhaikaka University, Gujarat, India.
  • Rajesh Kumar
    Johns Hopkins University (JHU), Baltimore, MD 21218, USA.
  • Poonam Singh
    ICMR-National Institute of Malaria Research, Dwarka sector 8, Delhi 110077, India. [email protected].
  • Himel Mondal
    Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.

Keywords

No keywords available for this article.