A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation
Journal:
arXiv
Published Date:
Jul 2, 2025
Abstract
Remote sensing semantic segmentation must address both what the ground
objects are within an image and where they are located. Consequently,
segmentation models must ensure not only the semantic correctness of
large-scale patches (low-frequency information) but also the precise
localization of boundaries between patches (high-frequency information).
However, most existing approaches rely heavily on discriminative learning,
which excels at capturing low-frequency features, while overlooking its
inherent limitations in learning high-frequency features for semantic
segmentation. Recent studies have revealed that diffusion generative models
excel at generating high-frequency details. Our theoretical analysis confirms
that the diffusion denoising process significantly enhances the model's ability
to learn high-frequency features; however, we also observe that these models
exhibit insufficient semantic inference for low-frequency features when guided
solely by the original image. Therefore, we integrate the strengths of both
discriminative and generative learning, proposing the Integration of
Discriminative and diffusion-based Generative learning for Boundary Refinement
(IDGBR) framework. The framework first generates a coarse segmentation map
using a discriminative backbone model. This map and the original image are fed
into a conditioning guidance network to jointly learn a guidance representation
subsequently leveraged by an iterative denoising diffusion process refining the
coarse segmentation. Extensive experiments across five remote sensing semantic
segmentation datasets (binary and multi-class segmentation) confirm our
framework's capability of consistent boundary refinement for coarse results
from diverse discriminative architectures. The source code will be available at
https://github.com/KeyanHu-git/IDGBR.