Evaluation of AI language models in answering pregnancy-related questions assessed by obstetrics specialists.

Journal: Scientific reports

Published Date: Feb 16, 2026

Abstract

This study aimed to compare the performance of three large language models-ChatGPT-3.5, Gemini, and ChatGPT-4.0-in generating responses to ten frequently asked pregnancy-related questions, as evaluated by obstetrics and gynecology specialists. Seventy-five specialists independently rated 30 anonymized AI-generated responses using a 5-point Likert scale across four domains: accuracy, reliability, patient-friendliness, and comprehensibility. All questions were standardized and presented verbatim to each model using identical zero-shot prompts. Data were analyzed using the Kruskal-Wallis test with Bonferroni-adjusted Mann-Whitney U post-hoc comparisons. Inter-rater consistency was assessed using Cronbach's alpha. Spearman correlation was used to examine associations between clinical experience and evaluation patterns. ChatGPT-4.0 demonstrated the highest overall performance, particularly in accuracy (median 4.35; mean ± SD: 4.30 ± 0.48) and patient-friendliness (4.40; 4.35 ± 0.47). Gemini performed comparably to ChatGPT-4.0 in comprehensibility (3.70; 3.68 ± 0.54), while ChatGPT-3.5 consistently received the lowest scores. Significant differences were observed among the three models for accuracy, reliability, and patient-friendliness (all p < 0.001), but not for comprehensibility (p = 0.521). A modest positive correlation was found between clinical experience and reliability ratings (r = 0.261, p = 0.0238). Among the evaluated models, ChatGPT-4.0 provided the most clinically aligned and patient-friendly responses to common pregnancy questions. While AI tools may offer valuable support for patient education, expert oversight remains essential to ensure accuracy and safety. Further research should explore their real-world impact on patient comprehension, behavior, and clinical outcomes.

Evaluation of AI language models in answering pregnancy-related questions assessed by obstetrics specialists.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Evaluation of AI language models in answering pregnancy-related questions assessed by obstetrics specialists.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals