Dual Semantic-Aware Network for Noise Suppressed Ultrasound Video Segmentation
Journal:
arXiv
Published Date:
Jul 10, 2025
Abstract
Ultrasound imaging is a prevalent diagnostic tool known for its simplicity
and non-invasiveness. However, its inherent characteristics often introduce
substantial noise, posing considerable challenges for automated lesion or organ
segmentation in ultrasound video sequences. To address these limitations, we
propose the Dual Semantic-Aware Network (DSANet), a novel framework designed to
enhance noise robustness in ultrasound video segmentation by fostering mutual
semantic awareness between local and global features. Specifically, we
introduce an Adjacent-Frame Semantic-Aware (AFSA) module, which constructs a
channel-wise similarity matrix to guide feature fusion across adjacent frames,
effectively mitigating the impact of random noise without relying on
pixel-level relationships. Additionally, we propose a Local-and-Global
Semantic-Aware (LGSA) module that reorganizes and fuses temporal unconditional
local features, which capture spatial details independently at each frame, with
conditional global features that incorporate temporal context from adjacent
frames. This integration facilitates multi-level semantic representation,
significantly improving the model's resilience to noise interference. Extensive
evaluations on four benchmark datasets demonstrate that DSANet substantially
outperforms state-of-the-art methods in segmentation accuracy. Moreover, since
our model avoids pixel-level feature dependencies, it achieves significantly
higher inference FPS than video-based methods, and even surpasses some
image-based models. Code can be found in
\href{https://github.com/ZhouL2001/DSANet}{DSANet}