Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal
Journal:
arXiv
Published Date:
Jul 5, 2025
Abstract
While image dehazing has advanced substantially in the past decade, most
efforts have focused on short-range scenarios, leaving long-range haze removal
under-explored. As distance increases, intensified scattering leads to severe
haze and signal loss, making it impractical to recover distant details solely
from visible images. Near-infrared, with superior fog penetration, offers
critical complementary cues through multimodal fusion. However, existing
methods focus on content integration while often neglecting haze embedded in
visible images, leading to results with residual haze. In this work, we argue
that the infrared and visible modalities not only provide complementary
low-level visual features, but also share high-level semantic consistency.
Motivated by this, we propose a Hierarchical Semantic-Visual Fusion (HSVF)
framework, comprising a semantic stream to reconstruct haze-free scenes and a
visual stream to incorporate structural details from the near-infrared
modality. The semantic stream first acquires haze-robust semantic prediction by
aligning modality-invariant intrinsic representations. Then the shared
semantics act as strong priors to restore clear and high-contrast distant
scenes under severe haze degradation. In parallel, the visual stream focuses on
recovering lost structural details from near-infrared by fusing complementary
cues from both visible and near-infrared images. Through the cooperation of
dual streams, HSVF produces results that exhibit both high-contrast scenes and
rich texture details. Moreover, we introduce a novel pixel-aligned
visible-infrared haze dataset with semantic labels to facilitate benchmarking.
Extensive experiments demonstrate the superiority of our method over
state-of-the-art approaches in real-world long-range haze removal.