Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Journal:
arXiv
Published Date:
Jan 20, 2025
Abstract
Class activation map (CAM) has been widely used to highlight image regions
that contribute to class predictions. Despite its simplicity and computational
efficiency, CAM often struggles to identify discriminative regions that
distinguish visually similar fine-grained classes. Prior efforts address this
limitation by introducing more sophisticated explanation processes, but at the
cost of extra complexity. In this paper, we propose Finer-CAM, a method that
retains CAM's efficiency while achieving precise localization of discriminative
regions. Our key insight is that the deficiency of CAM lies not in "how" it
explains, but in "what" it explains. Specifically, previous methods attempt to
identify all cues contributing to the target class's logit value, which
inadvertently also activates regions predictive of visually similar classes. By
explicitly comparing the target class with similar classes and spotting their
differences, Finer-CAM suppresses features shared with other classes and
emphasizes the unique, discriminative details of the target class. Finer-CAM is
easy to implement, compatible with various CAM methods, and can be extended to
multi-modal models for accurate localization of specific concepts.
Additionally, Finer-CAM allows adjustable comparison strength, enabling users
to selectively highlight coarse object contours or fine discriminative details.
Quantitatively, we show that masking out the top 5% of activated pixels by
Finer-CAM results in a larger relative confidence drop compared to baselines.
The source code and demo are available at
https://github.com/Imageomics/Finer-CAM.