Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation
Journal:
arXiv
Published Date:
Dec 12, 2024
Abstract
Foundation models trained on web-scraped datasets propagate societal biases
to downstream tasks. While counterfactual generation enables bias analysis,
existing methods introduce artifacts by modifying contextual elements like
clothing and background. We present a localized counterfactual generation
method that preserves image context by constraining counterfactual
modifications to specific attribute-relevant regions through automated masking
and guided inpainting. When applied to the Conceptual Captions dataset for
creating gender counterfactuals, our method results in higher visual and
semantic fidelity than state-of-the-art alternatives, while maintaining the
performance of models trained using only real data on non-human-centric tasks.
Models fine-tuned with our counterfactuals demonstrate measurable bias
reduction across multiple metrics, including a decrease in gender
classification disparity and balanced person preference scores, while
preserving ImageNet zero-shot performance. The results establish a framework
for creating balanced datasets that enable both accurate bias profiling and
effective mitigation.