Diffusion-empowered AutoPrompt MedSAM
Journal:
arXiv
Published Date:
Feb 5, 2025
Abstract
MedSAM, a medical foundation model derived from the SAM architecture, has
demonstrated notable success across diverse medical domains. However, its
clinical application faces two major challenges: the dependency on
labor-intensive manual prompt generation, which imposes a significant burden on
clinicians, and the absence of semantic labeling in the generated segmentation
masks for organs or lesions, limiting its practicality for non-expert users. To
address these limitations, we propose AutoMedSAM, an end-to-end framework
derived from SAM, designed to enhance usability and segmentation performance.
AutoMedSAM retains MedSAM's image encoder and mask decoder structure while
introducing a novel diffusion-based class prompt encoder. The diffusion-based
encoder employs a dual-decoder structure to collaboratively generate prompt
embeddings guided by sparse and dense prompt definitions. These embeddings
enhance the model's ability to understand and process clinical imagery
autonomously. With this encoder, AutoMedSAM leverages class prompts to embed
semantic information into the model's predictions, transforming MedSAM's
semi-automated pipeline into a fully automated workflow. Furthermore,
AutoMedSAM employs an uncertainty-aware joint optimization strategy during
training to effectively inherit MedSAM's pre-trained knowledge while improving
generalization by integrating multiple loss functions. Experimental results
across diverse datasets demonstrate that AutoMedSAM achieves superior
performance while broadening its applicability to both clinical settings and
non-expert users. Code is available at
https://github.com/HP-ML/AutoPromptMedSAM.git.