CUBIC: Concept Embeddings for Unsupervised Bias Identification using VLMs
Journal:
arXiv
Published Date:
May 16, 2025
Abstract
Deep vision models often rely on biases learned from spurious correlations in
datasets. To identify these biases, methods that interpret high-level,
human-understandable concepts are more effective than those relying primarily
on low-level features like heatmaps. A major challenge for these concept-based
methods is the lack of image annotations indicating potentially bias-inducing
concepts, since creating such annotations requires detailed labeling for each
dataset and concept, which is highly labor-intensive. We present CUBIC (Concept
embeddings for Unsupervised Bias IdentifiCation), a novel method that
automatically discovers interpretable concepts that may bias classifier
behavior. Unlike existing approaches, CUBIC does not rely on predefined bias
candidates or examples of model failures tied to specific biases, as such
information is not always available. Instead, it leverages image-text latent
space and linear classifier probes to examine how the latent representation of
a superclass label$\unicode{x2014}$shared by all instances in the
dataset$\unicode{x2014}$is influenced by the presence of a given concept. By
measuring these shifts against the normal vector to the classifier's decision
boundary, CUBIC identifies concepts that significantly influence model
predictions. Our experiments demonstrate that CUBIC effectively uncovers
previously unknown biases using Vision-Language Models (VLMs) without requiring
the samples in the dataset where the classifier underperforms or prior
knowledge of potential biases.