Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark

Journal: arXiv

Published Date: Jun 4, 2025

Abstract

With enhanced capabilities and widespread applications, Multimodal Large Language Models (MLLMs) are increasingly required to process and reason over multiple images simultaneously. However, existing MLLM benchmarks focus either on single-image visual reasoning or on multi-image understanding tasks with only final-answer evaluation, leaving the reasoning capabilities of MLLMs over multi-image inputs largely underexplored. To address this gap, we introduce the $\textbf{Multimodal Multi-image Reasoning Benchmark (MMRB)}$, the first benchmark designed to evaluate structured visual reasoning across multiple images. MMRB comprises $\textbf{92 sub-tasks}$ covering spatial, temporal, and semantic reasoning, with multi-solution, CoT-style annotations generated by GPT-4o and refined by human experts. A derivative subset is designed to evaluate multimodal reward models in multi-image scenarios. To support fast and scalable evaluation, we propose a sentence-level matching framework using open-source LLMs. Extensive baseline experiments on $\textbf{40 MLLMs}$, including 9 reasoning-specific models and 8 reward models, demonstrate that open-source MLLMs still lag significantly behind commercial MLLMs in multi-image reasoning tasks. Furthermore, current multimodal reward models are nearly incapable of handling multi-image reward ranking tasks.

Authors

Ziming Cheng
Binrui Xu
Lisheng Gong
Zuhe Song
Tianshuo Zhou
Shiqi Zhong
Siyu Ren
Mingxiang Chen
Xiangchao Meng
Yuxin Zhang
Yanlin Li
Lei Ren
Wei Chen
Zhiyuan Huang
Mingjie Zhan
Xiaojie Wang
Fangxiang Feng

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2506.04280v1)

Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals