Dataset of artificial breast cancer MRIs produced from unpaired mammograms.
Journal:
Data in brief
Published Date:
Dec 19, 2025
Abstract
The development of machine learning models for medical imaging is often constrained by the scarcity of large, paired datasets, particularly in breast cancer diagnostics where mammography and MRI modalities offer complementary diagnostic information. We introduce a dataset of artificial breast cancer MRIs generated through a CycleGAN architecture trained on unpaired cancer mammograms and breast MRI data. This approach addresses the critical gap in open-source paired mammography-MRI datasets by leveraging adversarial learning to establish cross-modal relationships without requiring direct image correspondence. Our methodology builds upon previous multi-modal imaging efforts, employing unpaired translation to force data pairing and create, large-scale training datasets. The generated artificial MRIs offer substantial benefits including enhanced patient privacy protection through synthetic data generation and significant cost reduction potential for MRI acquisition in resource-limited settings. We comprehensively evaluate the fidelity of our artificial MRI dataset against pre-existing tumor detection models. This dataset ultimately supports the development of more generalizable machine learning models for cancer diagnosis and treatment planning, through the use of artificial data augmentation.
Authors
Keywords
No keywords available for this article.