Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation
Journal:
arXiv
Published Date:
May 6, 2025
Abstract
Medical image segmentation is a pivotal task within the realms of medical
image analysis and computer vision. While current methods have shown promise in
accurately segmenting major regions of interest, the precise segmentation of
boundary areas remains challenging. In this study, we propose a novel network
architecture named CTO, which combines Convolutional Neural Networks (CNNs),
Vision Transformer (ViT) models, and explicit edge detection operators to
tackle this challenge. CTO surpasses existing methods in terms of segmentation
accuracy and strikes a better balance between accuracy and efficiency, without
the need for additional data inputs or label injections. Specifically, CTO
adheres to the canonical encoder-decoder network paradigm, with a dual-stream
encoder network comprising a mainstream CNN stream for capturing local features
and an auxiliary StitchViT stream for integrating long-range dependencies.
Furthermore, to enhance the model's ability to learn boundary areas, we
introduce a boundary-guided decoder network that employs binary boundary masks
generated by dedicated edge detection operators to provide explicit guidance
during the decoding process. We validate the performance of CTO through
extensive experiments conducted on seven challenging medical image segmentation
datasets, namely ISIC 2016, PH2, ISIC 2018, CoNIC, LiTS17, and BTCV. Our
experimental results unequivocally demonstrate that CTO achieves
state-of-the-art accuracy on these datasets while maintaining competitive model
complexity. The codes have been released at:
https://github.com/xiaofang007/CTO.