EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
High-resolution remote sensing (HRRS) image segmentation is challenging due
to complex spatial layouts and diverse object appearances. While CNNs excel at
capturing local features, they struggle with long-range dependencies, whereas
Transformers can model global context but often neglect local details and are
computationally expensive.We propose a novel approach, Region-Aware Proxy
Network (RAPNet), which consists of two components: Contextual Region Attention
(CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely
on grid-based layouts, RAPNet operates at the region level for more flexible
segmentation. The CRA module uses a Transformer to capture region-level
contextual dependencies, generating a Semantic Region Mask (SRM). The GCR
module learns a global class attention map to refine multi-class information,
combining the SRM and attention map for accurate segmentation.Experiments on
three public datasets show that RAPNet outperforms state-of-the-art methods,
achieving superior multi-class segmentation accuracy.