Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
Multimodal large language models (MLLMs) have demonstrated promising
prospects in healthcare, particularly for addressing complex medical tasks,
supporting multidisciplinary treatment (MDT), and enabling personalized
precision medicine. However, their practical deployment faces critical
challenges in resource efficiency, diagnostic accuracy, clinical
considerations, and ethical privacy. To address these limitations, we propose
Infi-Med, a comprehensive framework for medical MLLMs that introduces three key
innovations: (1) a resource-efficient approach through curating and
constructing high-quality supervised fine-tuning (SFT) datasets with minimal
sample requirements, with a forward-looking design that extends to both
pretraining and posttraining phases; (2) enhanced multimodal reasoning
capabilities for cross-modal integration and clinical task understanding; and
(3) a systematic evaluation system that assesses model performance across
medical modalities and task types. Our experiments demonstrate that Infi-Med
achieves state-of-the-art (SOTA) performance in general medical reasoning while
maintaining rapid adaptability to clinical scenarios. The framework establishes
a solid foundation for deploying MLLMs in real-world healthcare settings by
balancing model effectiveness with operational constraints.