Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and Convolutional Neural Network
Journal:
arXiv
Published Date:
Apr 28, 2025
Abstract
Purpose: The scarcity of high-quality curated labeled medical training data
remains one of the major limitations in applying artificial intelligence (AI)
systems to breast cancer diagnosis. Deep models for mammogram analysis and mass
(or micro-calcification) detection require training with a large volume of
labeled images, which are often expensive and time-consuming to collect. To
reduce this challenge, we proposed a novel method that leverages
self-supervised learning (SSL) and a deep hybrid model, named \textbf{HybMNet},
which combines local self-attention and fine-grained feature extraction to
enhance breast cancer detection on screening mammograms.
Approach: Our method employs a two-stage learning process: (1) SSL
Pretraining: We utilize EsViT, a SSL technique, to pretrain a Swin Transformer
(Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves
as the backbone for the downstream task. (2) Downstream Training: The proposed
HybMNet combines the Swin-T backbone with a CNN-based network and a novel
fusion strategy. The Swin-T employs local self-attention to identify
informative patch regions from the high-resolution mammogram, while the
CNN-based network extracts fine-grained local features from the selected
patches. A fusion module then integrates global and local information from both
networks to generate robust predictions. The HybMNet is trained end-to-end,
with the loss function combining the outputs of the Swin-T and CNN modules to
optimize feature extraction and classification performance.
Results: The proposed method was evaluated for its ability to detect breast
cancer by distinguishing between benign (normal) and malignant mammograms.
Leveraging SSL pretraining and the HybMNet model, it achieved AUC of 0.864 (95%
CI: 0.852, 0.875) on the CMMD dataset and 0.889 (95% CI: 0.875, 0.903) on the
INbreast dataset, highlighting its effectiveness.