The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI
Journal:
arXiv
Published Date:
May 5, 2025
Abstract
Multimodal learning, which integrates diverse data sources such as images,
text, and structured data, has proven superior to unimodal counterparts in
high-stakes decision-making. However, while performance gains remain the gold
standard for evaluating multimodal systems, concerns around bias and robustness
are frequently overlooked. In this context, this paper explores two key
research questions (RQs): (i) RQ1 examines whether adding a modality
con-sistently enhances performance and investigates its role in shaping
fairness measures, assessing whether it mitigates or amplifies bias in
multimodal models; (ii) RQ2 investigates the impact of missing modalities at
inference time, analyzing how multimodal models generalize in terms of both
performance and fairness. Our analysis reveals that incorporating new
modalities during training consistently enhances the performance of multimodal
models, while fairness trends exhibit variability across different evaluation
measures and datasets. Additionally, the absence of modalities at inference
degrades performance and fairness, raising concerns about its robustness in
real-world deployment. We conduct extensive experiments using multimodal
healthcare datasets containing images, time series, and structured information
to validate our findings.