Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Evaluating image captions requires cohesive assessment of both visual
semantics and language pragmatics, which is often not entirely captured by most
metrics. We introduce Redemption Score, a novel hybrid framework that ranks
image captions by triangulating three complementary signals: (1) Mutual
Information Divergence (MID) for global image-text distributional alignment,
(2) DINO-based perceptual similarity of cycle-generated images for visual
grounding, and (3) BERTScore for contextual text similarity against human
references. A calibrated fusion of these signals allows Redemption Score to
offer a more holistic assessment. On the Flickr8k benchmark, Redemption Score
achieves a Kendall-$\tau$ of 56.43, outperforming twelve prior methods and
demonstrating superior correlation with human judgments without requiring
task-specific training. Our framework provides a more robust and nuanced
evaluation by effectively redeeming image semantics and linguistic
interpretability indicated by strong transfer of knowledge in the Conceptual
Captions and MS COCO datasets.