Artificial intelligence in obstetrics and gynecology: Evaluating ChatGPT and Google Gemini in answering patient questions.

Journal: International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics

Published Date: Oct 28, 2025

Abstract

INTRODUCTION: To evaluate the accuracy and completeness of responses across common obstetrical and gynecologic topics generated by the large language models (LLMs) ChatGPT and Google Gemini, which have become increasingly popular for patients seeking medical information before physician consultations. METHODS: Ten topics were identified, five obstetrical (prenatal labs, extended carrier screen, treatments for nausea and vomiting in pregnancy, gestational diabetes, and trial of labor after cesarean section) and five gynecologic (polycystic ovary syndrome, pelvic inflammatory disease, cervical smears, mammograms, and birth control). For each condition, ChatGPT generated five of the most frequently asked patient questions, which were then presented separately to ChatGPT and Google Gemini. Board-certified Obstetrics and Gynecology physicians evaluated the responses using Likert scales for accuracy (1-6) and completeness (1-3). RESULTS: Acceptable response criteria were defined as an accuracy score of 5 or greater ("nearly all correct") and a completeness score of 2 or greater ("adequately complete"). Most responses from both models met these thresholds. Wilcoxon signed-rank tests demonstrated statistically significant differences in accuracy and completeness between models (P < 0.05). Inter-rater agreement was measured using intraclass correlation coefficients. For obstetrical topics, ChatGPT scored -0.047 (completeness) and 0.112 (accuracy), whereas Google Gemini scored 0.367 and 0.205, respectively. For gynecologic topics, ChatGPT scored 0.328 and 0.20, compared with Google Gemini at 0.151 and -0.08. CONCLUSION: Both LLMs provided largely accurate and complete responses to patient questions. ChatGPT demonstrated stronger outcomes overall, suggesting potential utility in patient education; however, patients should confirm online information with physicians given the limitations of LLMs.

Authors

Madeline West

Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
Amir Alsaidi

Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
Rohail Siddiqi

Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
Fatima Sayyed

Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas, USA.
Rachael Counts

Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.
Lauren Quinto

Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.
Nicholas Stansbury

Department of Obstetrics & Gynecology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (41147295)

Artificial intelligence in obstetrics and gynecology: Evaluating ChatGPT and Google Gemini in answering patient questions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Artificial intelligence in obstetrics and gynecology: Evaluating ChatGPT and Google Gemini in answering patient questions.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals