Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.

Journal: Cureus
Published Date:

Abstract

Background Plain language summaries (PLSs) make scientific research accessible to a broad non-expert audience. However, crafting effective PLS can be challenging, particularly for non-native English-speaking researchers. Large language model (LLM) chatbots have the potential to assist in generating summaries, but their effectiveness compared to human-generated PLS remains underexplored. Methods This cross-sectional study compared 30 human-written PLS with LLM chatbot (viz., ChatGPT (OpenAI, San Francisco, CA), Claude (Anthropic, San Francisco, CA), Copilot (Microsoft Corp., Washington, DC), Gemini (Google, Mountain View, CA), Meta AI (Meta, Menlo Park, CA), and Perplexity (Perplexity AI, Inc., San Francisco, CA)) generated PLS. The readability of the PLS was checked by the Flesch reading (FR) ease score, and understandability was checked by the Flesch-Kincaid (FK) grade level. Three authors rated the text on seven-item predefined criteria, and their average score was used to compare the quality of the PLS. Results In comparison to human-written PLS, chatbots could generate PLS with lower FK grade levels (p-value < 0.0001) and except Copilot, all others had higher FR ease scores. The overall score of human-written PLS was 8.89±0.26. Although there was statistically significant variance among the scores (F = 7.16, p-value = 0.0012), in the post-hoc test, there was no difference between human-generated and individual chatbots-generated PLS (ChatGPT 8.8±0.34, Claude 8.89±0.33, Copilot 8.69±0.4, Gemini 8.56±0.56, Meta AI 8.98±0.23, and Perplexity 8.8±0.3). Conclusion LLM chatbots can generate PLS with better readability and a person with a lower grade of education can understand it. The PLS are of similar quality to those written by human authors. Hence, authors can generate PLS from LLM chatbots and it is particularly beneficial for researchers in developing countries. While LLM chatbots improve readability, they may introduce minor inaccuracies also. Hence, PLS generated by LLM should always checked for accuracy and relevancy.

Authors

  • Himel Mondal
    Department of Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.
  • Gaurav Gupta
    Department of Neurosurgery, Rutgers New Jersey Medical School, Newark, New Jersey.
  • Pradosh Kumar Sarangi
    Department of Radiodiagnosis, All India Institute of Medical Sciences, Deoghar, Jharkhand, India.
  • Shreya Sharma
    Neuromodulation Laboratory/Physiology, All India Institute of Medical Sciences, Deoghar, IND.
  • Pritam K Choudhary
    Neuromodulation Laboratory/Physiology, All India Institute of Medical Sciences, Deoghar, IND.
  • Ayesha Juhi
    Physiology, All India Institute of Medical Sciences, Deoghar, IND.
  • Anita Kumari
    Physiology, All India Institute of Medical Sciences, Deoghar, IND.
  • Shaikat Mondal
    Department of Physiology, Raiganj Government Medical College and Hospital, West Bengal, India.

Keywords

No keywords available for this article.