Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Journal: arXiv

Published Date: May 28, 2025

Abstract

Modern vision-language models (VLMs) often fail at cultural competency evaluations and benchmarks. Given the diversity of applications built upon VLMs, there is renewed interest in understanding how they encode cultural nuances. While individual aspects of this problem have been studied, we still lack a comprehensive framework for systematically identifying and annotating the nuanced cultural dimensions present in images for VLMs. This position paper argues that foundational methodologies from visual culture studies (cultural studies, semiotics, and visual studies) are necessary for cultural analysis of images. Building upon this review, we propose a set of five frameworks, corresponding to cultural dimensions, that must be considered for a more complete analysis of the cultural competencies of VLMs.

Authors

Srishti Yadav
Lauren Tilton
Maria Antoniak
Taylor Arnold
Jiaang Li
Siddhesh Milind Pawar
Antonia Karamolegkou
Stella Frank
Zhaochong An
Negar Rostamzadeh
Daniel Hershcovich
Serge Belongie
Ekaterina Shutova

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.22793v1)

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals