CvTFuse: An unsupervised medical image fusion method of gliomas T1-DWI mode.
Journal:
Magnetic resonance imaging
Published Date:
Jan 15, 2026
Abstract
BACKGROUND: DWI can provide microscopic information on the diffusion of water molecules, whereas T1WI can provide high-resolution anatomical and histological information. PURPOSE: Accurately and effectively fusing different MRI modalities can precisely localize lesion areas and provide rich information for analyzing the nature of lesions. METHODS: We propose a dual-branch medical image fusion network that combines convolutional neural network (CNN) and vision transformer (CvTFuse). CvTFuse consists of three parts: encoder, fusion layer, and decoder. The encoder is divided into a CNN module and a transformer module, which are used to extract local and global features of the source image. To completely capture the contextual information of the image, a global context aggregation module (GCAM) is proposed, which aggregates multi-scale features extracted from the transformer branch to improve the quality of the fused image. The fusion layer employs an energy-aware and gradient-enhanced fusion strategy to help retain the details in the source images for feature fusion of different MRI modalities. The decoder consists of five convolutional layers and two skip connections to reconstruct the fused features. RESULTS: Qualitative results showed that this method presented clear texture details and sharp boundaries, preserving the salient information of the source images to the greatest extent. Quantitative results indicated that the method achieved average gradient, information entropy, mutual information, and visual saliency of 4.5975, 4.9073, 2.5181, and 0.77, respectively. Qualitative and quantitative results demonstrated that compared with deep learning fusion methods such as DenseFuse, RFN-Nest, MSDNet, IFCNN, CDDFuse, and SwinFusion, this method maintained gradient information, texture information, and edge details very well, while also minimizing information loss and reducing distortion. CONCLUSION: This method can combine information from different modalities of MR images, allowing for accurate localization of lesion areas. It also utilizes rich clinical information to aid in the precise diagnosis and formulation of treatment plans.
Authors
Keywords
No keywords available for this article.