TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
Journal:
arXiv
Published Date:
May 16, 2025
Abstract
Recent progress in Multimodal Large Language Models (MLLMs) have
significantly enhanced the ability of artificial intelligence systems to
understand and generate multimodal content. However, these models often exhibit
limited effectiveness when applied to non-Western cultural contexts, which
raises concerns about their wider applicability. To address this limitation, we
propose the Traditional Chinese Culture understanding Benchmark (TCC-Bench), a
bilingual (i.e., Chinese and English) Visual Question Answering (VQA) benchmark
specifically designed for assessing the understanding of traditional Chinese
culture by MLLMs. TCC-Bench comprises culturally rich and visually diverse
data, incorporating images from museum artifacts, everyday life scenes, comics,
and other culturally significant contexts. We adopt a semi-automated pipeline
that utilizes GPT-4o in text-only mode to generate candidate questions,
followed by human curation to ensure data quality and avoid potential data
leakage. The benchmark also avoids language bias by preventing direct
disclosure of cultural concepts within question texts. Experimental evaluations
across a wide range of MLLMs demonstrate that current models still face
significant challenges when reasoning about culturally grounded visual content.
The results highlight the need for further research in developing culturally
inclusive and context-aware multimodal systems. The code and data can be found
at: https://tcc-bench.github.io/.