Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries.

Journal: Scientific reports
Published Date:

Abstract

Large language models (LLMs) can potentially enhance the accessibility and quality of medical information. This study evaluates the reliability and quality of responses generated by ChatGPT-4, an LLM-driven chatbot, compared to those written by physicians, focusing on otorhinolaryngological advice in real-world, text-based workflows. Responses from a public social media forum were anonymized, and ChatGPT-4 generated corresponding replies. A panel of seven board-certified otorhinolaryngologists assessed both sets of responses using six criteria: overall quality, empathy, alignment with medical consensus, information accuracy, inquiry comprehension, and harm potential. Ordinal logistic regression analysis identified factors influencing response quality. ChatGPT-4 responses were preferred in 70.7% of cases and were significantly longer (median: 162 words) than physician responses (median: 67 words; P < .0001). The chatbot's responses received higher ratings across all criteria, with key predictors of this higher quality being greater empathy, stronger alignment with medical consensus, lower potential for harm, and fewer inaccuracies. ChatGPT-4 consistently outperformed physicians in generating responses that adhered to medical consensus, demonstrated accuracy, and conveyed empathy. These findings suggest that integrating AI tools into text-based healthcare consultations could help physicians better address complex, nuanced inquiries and provide high-quality, comprehensive medical advice.

Authors

  • Masaomi Motegi
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan. m_motegi@gunma-u.ac.jp.
  • Masato Shino
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
  • Mikio Kuwabara
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
  • Hideyuki Takahashi
    Department of Systems Innovation, Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka, 560-0043, Japan.
  • Toshiyuki Matsuyama
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
  • Hiroe Tada
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
  • Hiroyuki Hagiwara
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.
  • Kazuaki Chikamatsu
    Department of Otolaryngology-Head and Neck Surgery, Gunma University Graduate School of Medicine, 3-39-15 Showamachi, Maebashi, Gunma, 371-8511, Japan.