Rethinking U-Net architecture in medical imaging: Advancing the efficient and interpretable UKAN-CBAM framework for colorectal polyp segmentation.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

Prompt detection of colorectal polyps is essential for preventing colorectal cancer, a leading cause of cancer-related deaths worldwide. However, manual detection through medical imaging faces significant challenges, including high costs, reliance on skilled endoscopists, and susceptibility to errors, which can result in missed diagnoses and adverse health outcomes. This study proposes UKAN-CBAM, an advanced semantic segmentation framework that combines Kolmogorov-Arnold Networks (KANs) with Convolutional Block Attention Modules (CBAM) within a U-Net architecture. This two-phase encoder-decoder design integrates convolutional and tokenized KAN blocks to leverage the efficiency of KANs and the feature refinement capabilities of CBAM, achieving superior segmentation performance with enhanced interpretability and compactness. The framework was trained on the Kvasir-SEG dataset and validated across external datasets, including CVC-ClinicDB, CVC-ColonDB, EndoScene, PolypGen, ETIS-LaribPolypDB, and Piccolo. In addition, 10-fold cross-validation was performed to ensure robustness and generalization. UKAN-CBAM outperformed state-of-the-art (SOTA) methods, achieving an mDice of 93.80%, an mIoU of 89.18%, a precision of 95.65%, a recall of 92.02%, and an accuracy of 96.21%. It also demonstrated computational efficiency, requiring only 55.99 MB of memory and 5.214 GFLOPs, and achieved inference speeds of 122.272 ms per prediction. The feature maps, heatmaps, and Grad-CAM showed that the model focuses on key regions, whereas the ablations highlight the importance of configuration for robustness. Paired t-tests with P values, confidence intervals, and standard deviations, along with 10-fold cross-validation, further confirmed that the reported improvements were statistically significant and not due to chance. Strong generalization across diverse image and video datasets and real-time capabilities provide an effective and reliable tool for clinical applications. This integration of attention mechanisms and interpretability represents a significant step forward in medical diagnostics. Code availability: https://github.com/Faysal425/UKAN_CBAM_Segmentation.

Authors

Keywords

No keywords available for this article.