VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models
Journal:
arXiv
Published Date:
May 28, 2025
Abstract
While bias in large language models (LLMs) is well-studied, similar concerns
in vision-language models (VLMs) have received comparatively less attention.
Existing VLM bias studies often focus on portrait-style images and
gender-occupation associations, overlooking broader and more complex social
stereotypes and their implied harm. This work introduces VIGNETTE, a
large-scale VQA benchmark with 30M+ images for evaluating bias in VLMs through
a question-answering framework spanning four directions: factuality,
perception, stereotyping, and decision making. Beyond narrowly-centered
studies, we assess how VLMs interpret identities in contextualized settings,
revealing how models make trait and capability assumptions and exhibit patterns
of discrimination. Drawing from social psychology, we examine how VLMs connect
visual identity cues to trait and role-based inferences, encoding social
hierarchies, through biased selections. Our findings uncover subtle,
multifaceted, and surprising stereotypical patterns, offering insights into how
VLMs construct social meaning from inputs.