Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.

Journal: Scientific reports

Published Date: Jul 1, 2025

Abstract

Large language models (LLMs) can potentially enhance the accessibility and quality of medical information. This study evaluates the reliability and quality of responses generated by ChatGPT-4, an LLM-driven chatbot, compared to those written by physicians, focusing on otorhinolaryngological advice in real-world, text-based workflows. Responses from a public social media forum were anonymized, and ChatGPT-4 generated corresponding replies. A panel of seven board-certified otorhinolaryngologists assessed both sets of responses using six criteria: overall quality, empathy, alignment with medical consensus, information accuracy, inquiry comprehension, and harm potential. Ordinal logistic regression analysis identified factors influencing response quality. ChatGPT-4 responses were preferred in 70.7% of cases and were significantly longer (median: 162 words) than physician responses (median: 67 words; P < .0001). The chatbot's responses received higher ratings across all criteria, with key predictors of this higher quality being greater empathy, stronger alignment with medical consensus, lower potential for harm, and fewer inaccuracies. ChatGPT-4 consistently outperformed physicians in generating responses that adhered to medical consensus, demonstrated accuracy, and conveyed empathy. These findings suggest that integrating AI tools into text-based healthcare consultations could help physicians better address complex, nuanced inquiries and provide high-quality, comprehensive medical advice.

Authors

Masaomi Motegi

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan. m_motegi@gunma-u.ac.jp.
Masato Shino

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
Mikio Kuwabara

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
Hideyuki Takahashi

Department of Systems Innovation, Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka, 560-0043, Japan.
Toshiyuki Matsuyama

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
Hiroe Tada

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
Hiroyuki Hagiwara

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
Kazuaki Chikamatsu

Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.

Keywords

Female Generative Artificial Intelligence Humans Language Large Language Models Male Otolaryngology Physicians Social Media

External Resources

View on PubMed Access via DOI PubMed (40596359)

Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals