GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation
Journal:
arXiv
Published Date:
Jan 6, 2025
Abstract
Vision Transformers (ViTs) have shown promise in medical image semantic
segmentation (MISS) by capturing long-range correlations. However, ViTs often
struggle to model local spatial information effectively, which is essential for
accurately segmenting fine anatomical details, particularly when applied to
small datasets without extensive pre-training. We introduce Gabor and Laplacian
of Gaussian Convolutional Swin Network (GLoG-CSUnet), a novel architecture
enhancing Transformer-based models by incorporating learnable radiomic
features. This approach integrates dynamically adaptive Gabor and Laplacian of
Gaussian (LoG) filters to capture texture, edge, and boundary information,
enhancing the feature representation processed by the Transformer model. Our
method uniquely combines the long-range dependency modeling of Transformers
with the texture analysis capabilities of Gabor and LoG features. Evaluated on
the Synapse multi-organ and ACDC cardiac segmentation datasets, GLoG-CSUnet
demonstrates significant improvements over state-of-the-art models, achieving a
1.14% increase in Dice score for Synapse and 0.99% for ACDC, with minimal
computational overhead (only 15 and 30 additional parameters, respectively).
GLoG-CSUnet's flexible design allows integration with various base models,
offering a promising approach for incorporating radiomics-inspired feature
extraction in Transformer architectures for medical image analysis. The code
implementation is available on GitHub at: https://github.com/HAAIL/GLoG-CSUnet.