Artificial intelligence in obstetrics and gynecology: Evaluating ChatGPT and Google Gemini in answering patient questions.
Journal:
International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics
Published Date:
Oct 28, 2025
Abstract
INTRODUCTION: To evaluate the accuracy and completeness of responses across common obstetrical and gynecologic topics generated by the large language models (LLMs) ChatGPT and Google Gemini, which have become increasingly popular for patients seeking medical information before physician consultations. METHODS: Ten topics were identified, five obstetrical (prenatal labs, extended carrier screen, treatments for nausea and vomiting in pregnancy, gestational diabetes, and trial of labor after cesarean section) and five gynecologic (polycystic ovary syndrome, pelvic inflammatory disease, cervical smears, mammograms, and birth control). For each condition, ChatGPT generated five of the most frequently asked patient questions, which were then presented separately to ChatGPT and Google Gemini. Board-certified Obstetrics and Gynecology physicians evaluated the responses using Likert scales for accuracy (1-6) and completeness (1-3). RESULTS: Acceptable response criteria were defined as an accuracy score of 5 or greater ("nearly all correct") and a completeness score of 2 or greater ("adequately complete"). Most responses from both models met these thresholds. Wilcoxon signed-rank tests demonstrated statistically significant differences in accuracy and completeness between models (Pā<ā0.05). Inter-rater agreement was measured using intraclass correlation coefficients. For obstetrical topics, ChatGPT scored -0.047 (completeness) and 0.112 (accuracy), whereas Google Gemini scored 0.367 and 0.205, respectively. For gynecologic topics, ChatGPT scored 0.328 and 0.20, compared with Google Gemini at 0.151 and -0.08. CONCLUSION: Both LLMs provided largely accurate and complete responses to patient questions. ChatGPT demonstrated stronger outcomes overall, suggesting potential utility in patient education; however, patients should confirm online information with physicians given the limitations of LLMs.
Authors
Keywords
No keywords available for this article.