More is Less? A Simulation-Based Approach to Dynamic Interactions between Biases in Multimodal Models
Journal:
arXiv
Published Date:
Dec 23, 2024
Abstract
Multimodal machine learning models, such as those that combine text and image
modalities, are increasingly used in critical domains including public safety,
security, and healthcare. However, these systems inherit biases from their
single modalities. This study proposes a systemic framework for analyzing
dynamic multimodal bias interactions. Using the MMBias dataset, which
encompasses categories prone to bias such as religion, nationality, and sexual
orientation, this study adopts a simulation-based heuristic approach to compute
bias scores for text-only, image-only, and multimodal embeddings. A framework
is developed to classify bias interactions as amplification (multimodal bias
exceeds both unimodal biases), mitigation (multimodal bias is lower than both),
and neutrality (multimodal bias lies between unimodal biases), with
proportional analyzes conducted to identify the dominant mode and dynamics in
these interactions. The findings highlight that amplification (22\%) occurs
when text and image biases are comparable, while mitigation (11\%) arises under
the dominance of text bias, highlighting the stabilizing role of image bias.
Neutral interactions (67\%) are related to a higher text bias without
divergence. Conditional probabilities highlight the text's dominance in
mitigation and mixed contributions in neutral and amplification cases,
underscoring complex modality interplay. In doing so, the study encourages the
use of this heuristic, systemic, and interpretable framework to analyze
multimodal bias interactions, providing insight into how intermodal biases
dynamically interact, with practical applications for multimodal modeling and
transferability to context-based datasets, all essential for developing fair
and equitable AI models.