GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Journal:
arXiv
Published Date:
Mar 26, 2025
Abstract
The task of LiDAR-based 3D Open-Vocabulary Detection (3D OVD) requires the
detector to learn to detect novel objects from point clouds without
off-the-shelf training labels. Previous methods focus on the learning of
object-level representations and ignore the scene-level information, thus it is
hard to distinguish objects with similar classes. In this work, we propose a
Global-Local Collaborative Reason and Debate with PSL (GLRD) framework for the
3D OVD task, considering both local object-level information and global
scene-level information. Specifically, LLM is utilized to perform common sense
reasoning based on object-level and scene-level information, where the
detection result is refined accordingly. To further boost the LLM's ability of
precise decisions, we also design a probabilistic soft logic solver (OV-PSL) to
search for the optimal solution, and a debate scheme to confirm the class of
confusable objects. In addition, to alleviate the uneven distribution of
classes, a static balance scheme (SBC) and a dynamic balance scheme (DBC) are
designed. In addition, to reduce the influence of noise in data and training,
we further propose Reflected Pseudo Labels Generation (RPLG) and
Background-Aware Object Localization (BAOL). Extensive experiments conducted on
ScanNet and SUN RGB-D demonstrate the superiority of GLRD, where absolute
improvements in mean average precision are $+2.82\%$ on SUN RGB-D and $+3.72\%$
on ScanNet in the partial open-vocabulary setting. In the full open-vocabulary
setting, the absolute improvements in mean average precision are $+4.03\%$ on
ScanNet and $+14.11\%$ on SUN RGB-D.