Anatomical Accuracy of Generative AI for Congenital Heart Disease Illustrations: Gemini NanoBanana Versus ChatGPT Models in a Blinded Comparative Study

Journal: medRxiv

Published Date: Feb 23, 2026

Abstract

Background Generative artificial intelligence (AI) systems are increasingly used to produce medical illustrations for education; however, their anatomical accuracy in complex domains such as congenital heart disease (CHD) remains insufficiently validated. Methods In an assessor-blinded comparative study, we evaluated AI-generated CHD illustrations from two contemporary text-to-image platforms (ChatGPT-5/ChatGPT-Images and Gemini NanoBanana) against human-modified educational images. Twenty different CHD types were included, yielding 147 images that were assessed by 20 physicians (10 CHD experts and 10 non-specialists). Images were rated across four domains: anatomical accuracy, label usefulness, visual attractiveness, and suitability for medical education (total score range, 4-12). Results Among 2,940 total image evaluations, the human-modified images demonstrated the highest anatomical accuracy (48.3% rated accurate), followed by NanoBanana (22.7%), while ChatGPT-generated images were predominantly rated as fabricated or incorrect (86.3% for ChatGPT-5 and 85.2% for ChatGPT-Images; p<0.001). Educational usability "as is" was highest for the human-modified images (37.9%) compared with NanoBanana (13.1%) and ChatGPT platforms ([≤]2.1%; p<0.001). Median overall quality scores were 8 for the human-modified CHD images and NanoBanana, versus 4 for both ChatGPT systems (p<0.001). In multivariable analysis, NanoBanana images were the closest to the human-modified images in quality (95% CI, 0.91-0.98), while ChatGPT-Images (95% CI, 0.58-0.63) and ChatGPT-5 (95% CI, 0.55-0.59) showed marked quality reductions. Conclusions The current generative AI systems produced visually compelling but frequently anatomically inaccurate CHD illustrations, falling substantially short of the current educational standards. Model choice strongly influences performance, with Gemini NanoBanana outperforming ChatGPT-based systems yet remaining inferior to expert-designed human-modified images. AI-generated cardiac imagery should be used only within expert-reviewed educational workflows rather than as independent instructional resources.

Authors

Alhuzaimi
A.; Alkanhal
A.; Alruwaili
A. R. S.; Alharbi
N. S.; Alfares
F.; Aldekhyyel
R. N.; Binkheder
S.; Temsah
A.; Aljamaan
F.; Shahzad
M.; Albriek
A. Z.; Alanazi
F. I.; Alhindi
D. A.; Al-khatib
S. M.; Darweesh
A. A.; Altamimi
I.; Jamal
A.; Saad
K.; Alhasan
K.; Al-Eyadhy
A.; Malki
K. H.; Temsah
M.-H.

External Resources

View on medRxiv Access via DOI

Anatomical Accuracy of Generative AI for Congenital Heart Disease Illustrations: Gemini NanoBanana Versus ChatGPT Models in a Blinded Comparative Study

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Anatomical Accuracy of Generative AI for Congenital Heart Disease Illustrations: Gemini NanoBanana Versus ChatGPT Models in a Blinded Comparative Study

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals