SITE: towards Spatial Intelligence Thorough Evaluation

Journal: arXiv

Published Date: May 8, 2025

Abstract

Spatial intelligence (SI) represents a cognitive ability encompassing the visualization, manipulation, and reasoning about spatial relationships, underpinning disciplines from neuroscience to robotics. We introduce SITE, a benchmark dataset towards SI Thorough Evaluation in a standardized format of multi-choice visual question-answering, designed to assess large vision-language models' spatial intelligence across diverse visual modalities (single-image, multi-image, and video) and SI factors (figural to environmental scales, spatial visualization and orientation, intrinsic and extrinsic, static and dynamic). Our approach to curating the benchmark combines a bottom-up survey about 31 existing datasets and a top-down strategy drawing upon three classification systems in cognitive science, which prompt us to design two novel types of tasks about view-taking and dynamic scenes. Extensive experiments reveal that leading models fall behind human experts especially in spatial orientation, a fundamental SI factor. Moreover, we demonstrate a positive correlation between a model's spatial reasoning proficiency and its performance on an embodied AI task.

Authors

Wenqi Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.05456v1)

SITE: towards Spatial Intelligence Thorough Evaluation

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

SITE: towards Spatial Intelligence Thorough Evaluation

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals