Unleashing Diffusion and State Space Models for Medical Image Segmentation
Journal:
arXiv
Published Date:
Jun 15, 2025
Abstract
Existing segmentation models trained on a single medical imaging dataset
often lack robustness when encountering unseen organs or tumors. Developing a
robust model capable of identifying rare or novel tumor categories not present
during training is crucial for advancing medical imaging applications. We
propose DSM, a novel framework that leverages diffusion and state space models
to segment unseen tumor categories beyond the training data. DSM utilizes two
sets of object queries trained within modified attention decoders to enhance
classification accuracy. Initially, the model learns organ queries using an
object-aware feature grouping strategy to capture organ-level visual features.
It then refines tumor queries by focusing on diffusion-based visual prompts,
enabling precise segmentation of previously unseen tumors. Furthermore, we
incorporate diffusion-guided feature fusion to improve semantic segmentation
performance. By integrating CLIP text embeddings, DSM captures
category-sensitive classes to improve linguistic transfer knowledge, thereby
enhancing the model's robustness across diverse scenarios and multi-label
tasks. Extensive experiments demonstrate the superior performance of DSM in
various tumor segmentation tasks. Code is available at
https://github.com/Rows21/KMax-Mamba.