Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.

Journal: Cureus

Published Date: Mar 21, 2025

Abstract

Background Plain language summaries (PLSs) make scientific research accessible to a broad non-expert audience. However, crafting effective PLS can be challenging, particularly for non-native English-speaking researchers. Large language model (LLM) chatbots have the potential to assist in generating summaries, but their effectiveness compared to human-generated PLS remains underexplored. Methods This cross-sectional study compared 30 human-written PLS with LLM chatbot (viz., ChatGPT (OpenAI, San Francisco, CA), Claude (Anthropic, San Francisco, CA), Copilot (Microsoft Corp., Washington, DC), Gemini (Google, Mountain View, CA), Meta AI (Meta, Menlo Park, CA), and Perplexity (Perplexity AI, Inc., San Francisco, CA)) generated PLS. The readability of the PLS was checked by the Flesch reading (FR) ease score, and understandability was checked by the Flesch-Kincaid (FK) grade level. Three authors rated the text on seven-item predefined criteria, and their average score was used to compare the quality of the PLS. Results In comparison to human-written PLS, chatbots could generate PLS with lower FK grade levels (p-value < 0.0001) and except Copilot, all others had higher FR ease scores. The overall score of human-written PLS was 8.89±0.26. Although there was statistically significant variance among the scores (F = 7.16, p-value = 0.0012), in the post-hoc test, there was no difference between human-generated and individual chatbots-generated PLS (ChatGPT 8.8±0.34, Claude 8.89±0.33, Copilot 8.69±0.4, Gemini 8.56±0.56, Meta AI 8.98±0.23, and Perplexity 8.8±0.3). Conclusion LLM chatbots can generate PLS with better readability and a person with a lower grade of education can understand it. The PLS are of similar quality to those written by human authors. Hence, authors can generate PLS from LLM chatbots and it is particularly beneficial for researchers in developing countries. While LLM chatbots improve readability, they may introduce minor inaccuracies also. Hence, PLS generated by LLM should always checked for accuracy and relevancy.

Authors

Himel Mondal

Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.
Gaurav Gupta

Department of Neurosurgery, Rutgers New Jersey Medical School, Newark, New Jersey.
Pradosh Kumar Sarangi

Department of Radiodiagnosis, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.
Shreya Sharma

Neuromodulation Laboratory/Physiology, All India Institute of Medical Sciences, Deoghar, IND.
Pritam K Choudhary

Neuromodulation Laboratory/Physiology, All India Institute of Medical Sciences, Deoghar, IND.
Ayesha Juhi

Physiology, All India Institute of Medical Sciences, Deoghar, IND.
Anita Kumari

Physiology, All India Institute of Medical Sciences, Deoghar, IND.
Shaikat Mondal

Department of Physiology, Raiganj Government Medical College and Hospital, West Bengal, India.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40260353)

Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals