Modeling Saliency Dataset Bias
Journal:
arXiv
Published Date:
May 15, 2025
Abstract
Recent advances in image-based saliency prediction are approaching gold
standard performance levels on existing benchmarks. Despite this success, we
show that predicting fixations across multiple saliency datasets remains
challenging due to dataset bias. We find a significant performance drop (around
40%) when models trained on one dataset are applied to another. Surprisingly,
increasing dataset diversity does not resolve this inter-dataset gap, with
close to 60% attributed to dataset-specific biases. To address this remaining
generalization gap, we propose a novel architecture extending a mostly
dataset-agnostic encoder-decoder structure with fewer than 20 dataset-specific
parameters that govern interpretable mechanisms such as multi-scale structure,
center bias, and fixation spread. Adapting only these parameters to new data
accounts for more than 75% of the generalization gap, with a large fraction of
the improvement achieved with as few as 50 samples. Our model sets a new
state-of-the-art on all three datasets of the MIT/Tuebingen Saliency Benchmark
(MIT300, CAT2000, and COCO-Freeview), even when purely generalizing from
unrelated datasets, but with a substantial boost when adapting to the
respective training datasets. The model also provides valuable insights into
spatial saliency properties, revealing complex multi-scale effects that combine
both absolute and relative sizes.