CMAF-Net: cross-modal attention fusion with information-theoretic regularization for imbalanced breast cancer histopathology.

Journal: Scientific reports
Published Date:

Abstract

Breast cancer diagnosis from histopathology images remains challenging due to two intertwined factors: severe class imbalance, where malignant cases represent a small minority of samples, and the need to integrate discriminative features across multiple spatial scales. Existing methods typically address imbalance and multi-scale fusion separately, leading to biased or redundant representations. We propose CMAF-Net, a theoretically grounded architecture that unifies information bottleneck principles with margin-based learning to jointly tackle these challenges. CMAF-Net employs a dual-branch CNN-Transformer backbone fused through a Cross-Modal Attention Fusion block, which implements temperature-controlled attention and redundancy minimization to preserve complementary local and global features. At the classification level, we introduce an Adaptive Class-Balanced Focal Loss that operationalizes margin theory under imbalance, enforcing larger margins for minority classes while dynamically adapting to feature distributions. Extensive experiments on the IDC dataset show that CMAF-Net achieves 94.92% sensitivity and 95.52% balanced accuracy, outperforming state-of-the-art baselines by up to 8.6% on malignant detection. Under extreme 99:1 imbalance, CMAF-Net maintains 76.45% sensitivity, demonstrating graceful degradation where competing methods fail catastrophically. Cross-dataset evaluation on BreakHis confirms robust zero-shot transfer across four magnifications with average sensitivity of 95.61%. Ablation studies and information-theoretic analyses validate the contributions of each component, while computational profiling shows CMAF-Net achieves superior accuracy-efficiency trade-offs compared to prior fusion networks. Beyond breast cancer, our framework establishes a principled template for information-theoretic fusion under class imbalance, with implications for rare disease detection, clinical decision support, and broader multi-modal learning tasks.

Authors

Keywords

No keywords available for this article.