Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion
Journal:
arXiv
Published Date:
Dec 11, 2024
Abstract
In medical imaging, precise annotation of lesions or organs is often
required. However, 3D volumetric images typically consist of hundreds or
thousands of slices, making the annotation process extremely time-consuming and
laborious. Recently, the Segment Anything Model (SAM) has drawn widespread
attention due to its remarkable zero-shot generalization capabilities in
interactive segmentation. While researchers have explored adapting SAM for
medical applications, such as using SAM adapters or constructing 3D SAM models,
a key question remains: Can traditional CNN networks achieve the same strong
zero-shot generalization in this task? In this paper, we propose the
Lightweight Interactive Network for 3D Medical Image Segmentation (LIM-Net), a
novel approach demonstrating the potential of compact CNN-based models. Built
upon a 2D CNN backbone, LIM-Net initiates segmentation by generating a 2D
prompt mask from user hints. This mask is then propagated through the 3D
sequence via the Memory Module. To refine and stabilize results during
interaction, the Multi-Round Result Fusion (MRF) Module selects and merges
optimal masks from multiple rounds. Our extensive experiments across multiple
datasets and modalities demonstrate LIM-Net's competitive performance. It
exhibits stronger generalization to unseen data compared to SAM-based models,
with competitive accuracy while requiring fewer interactions. Notably,
LIM-Net's lightweight design offers significant advantages in deployment and
inference efficiency, with low GPU memory consumption suitable for
resource-constrained environments. These promising results demonstrate LIM-Net
can serve as a strong baseline, complementing and contrasting with popular SAM
models to further boost effective interactive medical image segmentation. The
code will be released at \url{https://github.com/goodtime-123/LIM-Net}.