The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
Journal:
arXiv
Published Date:
May 21, 2025
Abstract
Can small language models with 0.5B to 5B parameters meaningfully engage in
trauma-informed, empathetic dialogue for individuals with PTSD? We address this
question by introducing TIDE, a dataset of 10,000 two-turn dialogues spanning
500 diverse PTSD client personas and grounded in a three-factor empathy model:
emotion recognition, distress normalization, and supportive reflection. All
scenarios and reference responses were reviewed for realism and trauma
sensitivity by a clinical psychologist specializing in PTSD. We evaluate eight
small language models before and after fine-tuning, comparing their outputs to
a frontier model (Claude Sonnet 3.5). Our IRB-approved human evaluation and
automatic metrics show that fine-tuning generally improves perceived empathy,
but gains are highly scenario- and user-dependent, with smaller models facing
an empathy ceiling. Demographic analysis shows older adults value distress
validation and graduate-educated users prefer nuanced replies, while gender
effects are minimal. We highlight the limitations of automatic metrics and the
need for context- and user-aware system design. Our findings, along with the
planned release of TIDE, provide a foundation for building safe,
resource-efficient, and ethically sound empathetic AI to supplement, not
replace, clinical mental health care.