Global-Local Interaction and Recalibration Network for Salient Object Detection in Optical Remote Sensing Images.
Journal:
IEEE transactions on cybernetics
Published Date:
Jun 2, 2026
Abstract
Optical remote sensing images (RSIs) exhibit extensive spatial coverage and complex geographic backgrounds, where salient objects in the optical RSIs present a variety of scales, shapes, and orientations. Over the past few years, many deep learning-based approaches, including convolutional neural network (CNN)-based and Transformer-based models, have been devoted to salient object detection (SOD) in optical RSIs. However, those models do not fully explore the complementarity and differences between CNN and Transformer features, resulting in a degradation in performance. Therefore, in this article, we propose a global-local interaction and recalibration network (i.e., GLIR-Net), which leverages the complementarity of global context from Transformer and local details from CNN while harmonizing their inherent differences. Specifically, in the encoder, we use the two-branch architecture comprising a Transformer branch and a CNN branch to extract the global contextual and local detailed features, respectively. Second, we employ the multiscale feature interaction (MFI) module to promote the interaction of the CNN and Transformer features, where the two features are mutually enhanced by their complementarity. Third, the two kinds of features are aggregated via the feature recalibration module (FRM), where the CNN and Transformer features are fused and recalibrated to harmonize their inherent differences. Finally, by deploying the decoder, we can progressively acquire the high-quality saliency map for each optical RSI. We conduct extensive comparative experiments with state-of-the-art models on two public datasets, and the experimental results firmly demonstrate that the proposed GLIR-Net outperforms all the other competitors both qualitatively and quantitatively.
Authors
Keywords
No keywords available for this article.