H2HTalk: Evaluating Large Language Models as Emotional Companion
Journal:
arXiv
Published Date:
Jul 4, 2025
Abstract
As digital emotional support needs grow, Large Language Model companions
offer promising authentic, always-available empathy, though rigorous evaluation
lags behind model advancement. We present Heart-to-Heart Talk (H2HTalk), a
benchmark assessing companions across personality development and empathetic
interaction, balancing emotional intelligence with linguistic fluency. H2HTalk
features 4,650 curated scenarios spanning dialogue, recollection, and itinerary
planning that mirror real-world support conversations, substantially exceeding
previous datasets in scale and diversity. We incorporate a Secure Attachment
Persona (SAP) module implementing attachment-theory principles for safer
interactions. Benchmarking 50 LLMs with our unified protocol reveals that
long-horizon planning and memory retention remain key challenges, with models
struggling when user needs are implicit or evolve mid-conversation. H2HTalk
establishes the first comprehensive benchmark for emotionally intelligent
companions. We release all materials to advance development of LLMs capable of
providing meaningful and safe psychological support.