3DBench: A scalable benchmark for object and scene-level instruction-tuning of 3D large language models.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Sep 1, 2025

Abstract

Recent assessments of Multi-Modal Large Language Models (MLLMs) have been thorough. However, a detailed benchmark that integrates point cloud data with language for MLLMs remains absent, leading to superficial comparisons that obscure advancements in the nuanced capabilities of such models. Current benchmarks typically feature object-level classification, scene-level captioning, and visual grounding (VG) tasks. These tasks inadequately encapsulate the spatial perception and logical reasoning skills of MLLMs, nor do they permit a just and all-encompassing assessment of MLLMs with varied architectures. To address these gaps, we propose 3DBench, a novel fine-grained benchmark specifically designed for MLLMs. It encompasses ten tasks spanning object and scene levels and organizes these tasks into three evaluative categories: expression, perception, and reasoning. Additionally, we present a scalable approach for constructing 3D instruction-tuning datasets derived from simulation environments, resulting in a dataset with over 239k question-answer pairs covering twelve tasks and their respective point clouds. Using this high-quality dataset, we introduce the Bench-model, which integrates advanced detection models to significantly enhance MLLM performance. We compare Bench-model against open-sourced 3D LLMs, analyzing the impact of different model architectures, training protocols, and public datasets. These experimental outcomes provide crucial perspectives on existing research limitations and suggest potential rooms for future investigation. Codes and datasets are available at https://github.com/Inshsang/3DBench.

Authors

Tianci Hu

Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai Institute of Advanced Communication and Data Science, Shanghai University, Shanghai, 200400, China. Electronic address: yinsh@shu.edu.cn.
Junjie Zhang

Department of Immunology, School of Basic Medical Sciences, Anhui Medical University, Hefei, 230032, PR China.
Yutao Rao

Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai Institute of Advanced Communication and Data Science, Shanghai University, Shanghai, 200400, China. Electronic address: ryt756227875@shu.edu.cn.
Dan Zeng

Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai Institute of Advanced Communication and Data Science, Shanghai University, Shanghai, 200400, China. Electronic address: dzeng@shu.edu.cn.
Hongwen Yu

Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai Institute of Advanced Communication and Data Science, Shanghai University, Shanghai, 200400, China. Electronic address: hw_yu@shu.edu.cn.
Xiaoshui Huang

School of Public Health, Shanghai Jiao Tong University, China.

Keywords

Benchmarking Humans Language Large Language Models Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40381369)

3DBench: A scalable benchmark for object and scene-level instruction-tuning of 3D large language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

3DBench: A scalable benchmark for object and scene-level instruction-tuning of 3D large language models.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals