Rapid Salient Object Detection with Difference Convolutional Neural Networks
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
This paper addresses the challenge of deploying salient object detection
(SOD) on resource-constrained devices with real-time performance. While recent
advances in deep neural networks have improved SOD, existing top-leading models
are computationally expensive. We propose an efficient network design that
combines traditional wisdom on SOD and the representation power of modern CNNs.
Like biologically-inspired classical SOD methods relying on computing contrast
cues to determine saliency of image regions, our model leverages Pixel
Difference Convolutions (PDCs) to encode the feature contrasts. Differently,
PDCs are incorporated in a CNN architecture so that the valuable contrast cues
are extracted from rich feature maps. For efficiency, we introduce a difference
convolution reparameterization (DCR) strategy that embeds PDCs into standard
convolutions, eliminating computation and parameters at inference.
Additionally, we introduce SpatioTemporal Difference Convolution (STDC) for
video SOD, enhancing the standard 3D convolution with spatiotemporal contrast
capture. Our models, SDNet for image SOD and STDNet for video SOD, achieve
significant improvements in efficiency-accuracy trade-offs. On a Jetson Orin
device, our models with $<$ 1M parameters operate at 46 FPS and 150 FPS on
streamed images and videos, surpassing the second-best lightweight models in
our experiments by more than $2\times$ and $3\times$ in speed with superior
accuracy. Code will be available at https://github.com/hellozhuo/stdnet.git.