Optimizing transformer-based network via advanced decoder design for medical image segmentation.

Journal: Biomedical physics & engineering express
Published Date:

Abstract

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years. However, these improvements often focus on the encoder, overlooking the crucial role of the decoder in optimizing segmentation details. This design imbalance limits the potential for further enhancing segmentation performance. To address this issue, we analyze the roles of various decoder components, including upsampling method, skip connection, and feature extraction module, as well as the shortcomings of existing methods. Consequently, we propose Swin DER (i.e.,UNETRecodernhanced andefined), by specifically optimizing the design of these three components. Swin DER performs upsampling using learnable interpolation algorithm called offset coordinate neighborhood weighted up sampling (Onsampling) and replaces traditional skip connection with spatial-channel parallel attention gate (SCP AG). Additionally, Swin DER introduces deformable convolution along with attention mechanism in the feature extraction module of the decoder. Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse dataset and the MSD brain tumor segmentation task. Code is available at:.

Authors

  • Weibin Yang
    School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.
  • Zhiqi Dong
    School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.
  • Mingyuan Xu
    Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.
  • Longwei Xu
    School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.
  • Dehua Geng
    School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.
  • Yusong Li
    School of Information Science and Engineering, Shandong University, Tsingtao, 266237, People's Republic of China.
  • Pengwei Wang
    School of Information Science and Engineering, Shandong University, Qingdao, China. Electronic address: wangpw@sdu.edu.cn.