ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models
Journal:
arXiv
Published Date:
May 20, 2025
Abstract
Deep learning models often achieve high performance by inadvertently learning
spurious correlations between targets and non-essential features. For example,
an image classifier may identify an object via its background that spuriously
correlates with it. This prediction behavior, known as spurious bias, severely
degrades model performance on data that lacks the learned spurious
correlations. Existing methods on spurious bias mitigation typically require a
variety of data groups with spurious correlation annotations called group
labels. However, group labels require costly human annotations and often fail
to capture subtle spurious biases such as relying on specific pixels for
predictions. In this paper, we propose a novel post hoc spurious bias
mitigation framework without requiring group labels. Our framework, termed
ShortcutProbe, identifies prediction shortcuts that reflect potential
non-robustness in predictions in a given model's latent space. The model is
then retrained to be invariant to the identified prediction shortcuts for
improved robustness. We theoretically analyze the effectiveness of the
framework and empirically demonstrate that it is an efficient and practical
tool for improving a model's robustness to spurious bias on diverse datasets.