Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning
Journal:
arXiv
Published Date:
May 25, 2025
Abstract
Recent advances in reinforcement learning with verifiable, rule-based rewards
have greatly enhanced the reasoning capabilities and out-of-distribution
generalization of VLMs/LLMs, obviating the need for manually crafted reasoning
chains. Despite these promising developments in the general domain, their
translation to medical imaging remains limited. Current medical reinforcement
fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby
restricting the model's ability to engage in world knowledge retrieval and
flexible task adaptation. More critically, these methods fall short of
addressing the critical clinical demand for open-ended, reasoning-intensive
decision-making. To bridge this gap, we introduce \textbf{MedCCO}, the first
multimodal reinforcement learning framework tailored for medical VQA that
unifies close-ended and open-ended data within a curriculum-driven RFT
paradigm. Specifically, MedCCO is initially fine-tuned on a diverse set of
close-ended medical VQA tasks to establish domain-grounded reasoning
capabilities, and is then progressively adapted to open-ended tasks to foster
deeper knowledge enhancement and clinical interpretability. We validate MedCCO
across eight challenging medical VQA benchmarks, spanning both close-ended and
open-ended settings. Experimental results show that MedCCO consistently
enhances performance and generalization, achieving a 11.4\% accuracy gain
across three in-domain tasks, and a 5.7\% improvement on five out-of-domain
benchmarks. These findings highlight the promise of curriculum-guided RL in
advancing robust, clinically-relevant reasoning in medical multimodal language
models.