Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI
Journal:
arXiv
Published Date:
May 15, 2025
Abstract
Effective communication about breast and cervical cancers remains a
persistent health challenge, with significant gaps in public understanding of
cancer prevention, screening, and treatment, potentially leading to delayed
diagnoses and inadequate treatments. This study evaluates the capabilities and
limitations of Large Language Models (LLMs) in generating accurate, safe, and
accessible cancer-related information to support patient understanding. We
evaluated five general-purpose and three medical LLMs using a mixed-methods
evaluation framework across linguistic quality, safety and trustworthiness, and
communication accessibility and affectiveness. Our approach utilized
quantitative metrics, qualitative expert ratings, and statistical analysis
using Welch's ANOVA, Games-Howell, and Hedges' g. Our results show that
general-purpose LLMs produced outputs of higher linguistic quality and
affectiveness, while medical LLMs demonstrate greater communication
accessibility. However, medical LLMs tend to exhibit higher levels of potential
harm, toxicity, and bias, reducing their performance in safety and
trustworthiness. Our findings indicate a duality between domain-specific
knowledge and safety in health communications. The results highlight the need
for intentional model design with targeted improvements, particularly in
mitigating harm and bias, and improving safety and affectiveness. This study
provides a comprehensive evaluation of LLMs for cancer communication, offering
critical insights for improving AI-generated health content and informing
future development of accurate, safe, and accessible digital health tools.