SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
High-resolution (HR) remote sensing imagery plays a vital role in a wide
range of applications, including urban planning and environmental monitoring.
However, due to limitations in sensors and data transmission links, the images
acquired in practice often suffer from resolution degradation. Remote Sensing
Image Super-Resolution (RSISR) aims to reconstruct HR images from
low-resolution (LR) inputs, providing a cost-effective and efficient
alternative to direct HR image acquisition. Existing RSISR methods primarily
focus on low-level characteristics in pixel space, while neglecting the
high-level understanding of remote sensing scenes. This may lead to
semantically inconsistent artifacts in the reconstructed results. Motivated by
this observation, our work aims to explore the role of high-level semantic
knowledge in improving RSISR performance. We propose a Semantic-Guided
Super-Resolution framework, SeG-SR, which leverages Vision-Language Models
(VLMs) to extract semantic knowledge from input images and uses it to guide the
super resolution (SR) process. Specifically, we first design a Semantic Feature
Extraction Module (SFEM) that utilizes a pretrained VLM to extract semantic
knowledge from remote sensing images. Next, we propose a Semantic Localization
Module (SLM), which derives a series of semantic guidance from the extracted
semantic knowledge. Finally, we develop a Learnable Modulation Module (LMM)
that uses semantic guidance to modulate the features extracted by the SR
network, effectively incorporating high-level scene understanding into the SR
pipeline. We validate the effectiveness and generalizability of SeG-SR through
extensive experiments: SeG-SR achieves state-of-the-art performance on two
datasets and consistently delivers performance improvements across various SR
architectures. Codes can be found at https://github.com/Mr-Bamboo/SeG-SR.