Comparing the Readability and Usability of Patient Education Materials Generated by Different Large Language Models: ChatGPT, Copilot, and Gemini.

Journal: The Journal of surgical research
Published Date:

Abstract

INTRODUCTION: Patients with low health literacy face challenges in understanding and navigating surgical care, leading to surgical disparities. The rising utilization of large language models (LLMs) may provide a scalable way to enhance patient education materials (PEMs) and improve understanding. This study aims to assess the readability and usability of PEMs generated by publicly available LLMs. METHODS: We identified existing colorectal PEMs from an academic health center, including preoperative, postoperative, and ostomy care. Using a previously optimized metric-based prompt, we generated de novo materials from three LLMs such as ChatGPT3.5, Copilot, and Gemini. All materials were assessed for readability through Flesch-Kincaid reading ease (ease), Flesch-Kincaid grade level (grade evel), and modified grade-level scores. Usability was assessed through understandability and actionability with the Patient Education Material Assessment Tool. Bivariate analyses were conducted using t-tests. RESULTS: In total, 208 education materials were generated from baseline and three LLMs with an average word count of 844-869 (baseline), 259-271 (ChatGPT), 163-223 (Copilot), and 275-319 (Gemini). Gemini-generated materials demonstrated improved readability (grade level 5.9; P < 0.001) from baseline (7.7), whereas ChatGPT (12.5) and Copilot (8.8) performed worse (both P < 0.001). Although all materials scored above 70% for understandability, LLMs performed worse than baseline for understandability (75%-83% versus 75%-100%) and actionability (40%-80% versus 80%-100%; P < 0.001). CONCLUSIONS: Significant variability in LLM performances was identified when generating de novo PEMs. While Gemini showed improvement in readability and all LLMs achieved understandability target scores, existing baseline materials are still superior in both understandability and actionability. Despite the potential of LLMs to improve readability and usability, utilization should be balanced with clinical expertise.

Authors

Keywords

No keywords available for this article.