QMaxViT-Unet+: A Query-Based MaxViT-Unet with Edge Enhancement for Scribble-Supervised Segmentation of Medical Images
Journal:
arXiv
Published Date:
Feb 14, 2025
Abstract
The deployment of advanced deep learning models for medical image
segmentation is often constrained by the requirement for extensively annotated
datasets. Weakly-supervised learning, which allows less precise labels, has
become a promising solution to this challenge. Building on this approach, we
propose QMaxViT-Unet+, a novel framework for scribble-supervised medical image
segmentation. This framework is built on the U-Net architecture, with the
encoder and decoder replaced by Multi-Axis Vision Transformer (MaxViT) blocks.
These blocks enhance the model's ability to learn local and global features
efficiently. Additionally, our approach integrates a query-based Transformer
decoder to refine features and an edge enhancement module to compensate for the
limited boundary information in the scribble label. We evaluate the proposed
QMaxViT-Unet+ on four public datasets focused on cardiac structures, colorectal
polyps, and breast cancer: ACDC, MS-CMRSeg, SUN-SEG, and BUSI. Evaluation
metrics include the Dice similarity coefficient (DSC) and the 95th percentile
of Hausdorff distance (HD95). Experimental results show that QMaxViT-Unet+
achieves 89.1\% DSC and 1.316mm HD95 on ACDC, 88.4\% DSC and 2.226mm HD95 on
MS-CMRSeg, 71.4\% DSC and 4.996mm HD95 on SUN-SEG, and 69.4\% DSC and 50.122mm
HD95 on BUSI. These results demonstrate that our method outperforms existing
approaches in terms of accuracy, robustness, and efficiency while remaining
competitive with fully-supervised learning approaches. This makes it ideal for
medical image analysis, where high-quality annotations are often scarce and
require significant effort and expense. The code is available at:
https://github.com/anpc849/QMaxViT-Unet