Artificial intelligence in obstetrics and gynecology: Evaluating ChatGPT and Google Gemini in answering patient questions.

Journal: International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics
Published Date:

Abstract

INTRODUCTION: To evaluate the accuracy and completeness of responses across common obstetrical and gynecologic topics generated by the large language models (LLMs) ChatGPT and Google Gemini, which have become increasingly popular for patients seeking medical information before physician consultations. METHODS: Ten topics were identified, five obstetrical (prenatal labs, extended carrier screen, treatments for nausea and vomiting in pregnancy, gestational diabetes, and trial of labor after cesarean section) and five gynecologic (polycystic ovary syndrome, pelvic inflammatory disease, cervical smears, mammograms, and birth control). For each condition, ChatGPT generated five of the most frequently asked patient questions, which were then presented separately to ChatGPT and Google Gemini. Board-certified Obstetrics and Gynecology physicians evaluated the responses using Likert scales for accuracy (1-6) and completeness (1-3). RESULTS: Acceptable response criteria were defined as an accuracy score of 5 or greater ("nearly all correct") and a completeness score of 2 or greater ("adequately complete"). Most responses from both models met these thresholds. Wilcoxon signed-rank tests demonstrated statistically significant differences in accuracy and completeness between models (P < 0.05). Inter-rater agreement was measured using intraclass correlation coefficients. For obstetrical topics, ChatGPT scored -0.047 (completeness) and 0.112 (accuracy), whereas Google Gemini scored 0.367 and 0.205, respectively. For gynecologic topics, ChatGPT scored 0.328 and 0.20, compared with Google Gemini at 0.151 and -0.08. CONCLUSION: Both LLMs provided largely accurate and complete responses to patient questions. ChatGPT demonstrated stronger outcomes overall, suggesting potential utility in patient education; however, patients should confirm online information with physicians given the limitations of LLMs.

Authors

  • Madeline West
    Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
  • Amir Alsaidi
    Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
  • Rohail Siddiqi
    Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
  • Fatima Sayyed
    Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
  • Rachael Counts
    Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.
  • Lauren Quinto
    Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.
  • Nicholas Stansbury
    Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.

Keywords

No keywords available for this article.