Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
Current artificial intelligence models for medical imaging are predominantly
single modality and single disease. Attempts to create multimodal and
multi-disease models have resulted in inconsistent clinical accuracy.
Furthermore, training these models typically requires large, labour-intensive,
well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal,
multi-specialty foundation model trained using self-supervised learning and a
memory module. MerMED-FM was trained on 3.3 million medical images from over
ten specialties and seven modalities, including computed tomography (CT), chest
X-rays (CXR), ultrasound (US), pathology patches, color fundus photography
(CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was
evaluated across multiple diseases and compared against existing foundational
models. Strong performance was achieved across all modalities, with AUROCs of
0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894
(CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable,
versatile, cross-specialty foundation model that enables robust medical imaging
interpretation across diverse medical disciplines.