MultiMed: Massively Multimodal and Multitask Medical Understanding
Journal:
arXiv
Published Date:
Aug 22, 2024
Abstract
Biomedical data is inherently multimodal, consisting of electronic health
records, medical imaging, digital pathology, genome sequencing, wearable
sensors, and more. The application of artificial intelligence tools to these
multifaceted sensing technologies has the potential to revolutionize the
prognosis, diagnosis, and management of human health and disease. However,
current approaches to biomedical AI typically only train and evaluate with one
or a small set of medical modalities and tasks. This limitation hampers the
development of comprehensive tools that can leverage the rich interconnected
information across many heterogeneous biomedical sensors. To address this
challenge, we present MultiMed, a benchmark designed to evaluate and enable
large-scale learning across a wide spectrum of medical modalities and tasks.
MultiMed consists of 2.56 million samples across ten medical modalities such as
medical reports, pathology, genomics, and protein data, and is structured into
eleven challenging tasks, including disease prognosis, protein structure
prediction, and medical question answering. Using MultiMed, we conduct
comprehensive experiments benchmarking state-of-the-art unimodal, multimodal,
and multitask models. Our analysis highlights the advantages of training
large-scale medical models across many related modalities and tasks. Moreover,
MultiMed enables studies of generalization across related medical concepts,
robustness to real-world noisy data and distribution shifts, and novel modality
combinations to improve prediction performance. MultiMed will be publicly
available and regularly updated and welcomes inputs from the community.