Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models
Journal:
arXiv
Published Date:
Apr 5, 2025
Abstract
The transformative potential of text-to-image (T2I) models hinges on their
ability to synthesize culturally diverse, photorealistic images from textual
prompts. However, these models often perpetuate cultural biases embedded within
their training data, leading to systemic misrepresentations. This paper
benchmarks the Component Inclusion Score (CIS), a metric designed to evaluate
the fidelity of image generation across cultural contexts. Through extensive
analysis involving 2,400 images, we quantify biases in terms of compositional
fragility and contextual misalignment, revealing significant performance gaps
between Western and non-Western cultural prompts. Our findings underscore the
impact of data imbalance, attention entropy, and embedding superposition on
model fairness. By benchmarking models like Stable Diffusion with CIS, we
provide insights into architectural and data-centric interventions for
enhancing cultural inclusivity in AI-generated imagery. This work advances the
field by offering a comprehensive tool for diagnosing and mitigating biases in
T2I generation, advocating for more equitable AI systems.