BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
Breast cancer remains a leading cause of cancer-related mortality worldwide,
making early detection and accurate treatment response monitoring critical
priorities. We present BreastDCEDL, a curated, deep learning-ready dataset
comprising pre-treatment 3D Dynamic Contrast-Enhanced MRI (DCE-MRI) scans from
2,070 breast cancer patients drawn from the I-SPY1, I-SPY2, and Duke cohorts,
all sourced from The Cancer Imaging Archive. The raw DICOM imaging data were
rigorously converted into standardized 3D NIfTI volumes with preserved signal
integrity, accompanied by unified tumor annotations and harmonized clinical
metadata including pathologic complete response (pCR), hormone receptor (HR),
and HER2 status. Although DCE-MRI provides essential diagnostic information and
deep learning offers tremendous potential for analyzing such complex data,
progress has been limited by lack of accessible, public, multicenter datasets.
BreastDCEDL addresses this gap by enabling development of advanced models,
including state-of-the-art transformer architectures that require substantial
training data. To demonstrate its capacity for robust modeling, we developed
the first transformer-based model for breast DCE-MRI, leveraging Vision
Transformer (ViT) architecture trained on RGB-fused images from three contrast
phases (pre-contrast, early post-contrast, and late post-contrast). Our ViT
model achieved state-of-the-art pCR prediction performance in HR+/HER2-
patients (AUC 0.94, accuracy 0.93). BreastDCEDL includes predefined benchmark
splits, offering a framework for reproducible research and enabling clinically
meaningful modeling in breast cancer imaging.