Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning
Journal:
arXiv
Published Date:
Jun 18, 2025
Abstract
Medical report generation from imaging data remains a challenging task in
clinical practice. While large language models (LLMs) show great promise in
addressing this challenge, their effective integration with medical imaging
data still deserves in-depth exploration. In this paper, we present MRG-LLM, a
novel multimodal large language model (MLLM) that combines a frozen LLM with a
learnable visual encoder and introduces a dynamic prompt customization
mechanism. Our key innovation lies in generating instance-specific prompts
tailored to individual medical images through conditional affine
transformations derived from visual features. We propose two implementations:
prompt-wise and promptbook-wise customization, enabling precise and targeted
report generation. Extensive experiments on IU X-ray and MIMIC-CXR datasets
demonstrate that MRG-LLM achieves state-of-the-art performance in medical
report generation. Our code will be made publicly available.