Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

Journal: arXiv

Published Date: Jul 1, 2025

Abstract

Recently, Multimodal Large Language Models (MLLMs) excel at visual grounding in single-image scenarios with textual references. However, their performance degrades when handling real-world applications involving complex multi-image compositions and multimodal instructions, which reveals limitations in cross-image reasoning and generalization. To address these challenges, we adopt a Reinforcement Learning (RL) based post-training strategy to improve the reasoning performance of MLLMs in multi-image grounding tasks. Our approach begins with synthesizing high-quality chain-of-thought (CoT) data for cold-start initialization, followed by supervised fine-tuning (SFT) using low-rank adaptation (LoRA). The cold-start training stage enables the model to identify correct solutions. Subsequently, we perform rejection sampling using the merged SFT model to curate high-quality RL data and leverage rule-based RL to guide the model toward optimal reasoning paths. Extensive experimental results demonstrate the effectiveness of our approach, achieving +9.04\% improvements on MIG-Bench and +4.98\% improvements on several out-of-domain reasoning grounding benchmarks over the SFT baseline. Furthermore, our approach exhibits strong generalization in multi-image perception, with gains of +3.1\% and +2.4\% over the base model on subsets of the BLINK and MMIU benchmarks, respectively.

Authors

Bob Zhang
Haoran Li
Tao Zhang
Cilin Yan
Jiayin Cai
Xiaolong Jiang
Yanbin Hao

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2507.00748v1)

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals