VISTA: Open-Vocabulary, Task-Relevant Robot Exploration with Online Semantic Gaussian Splatting
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
We present VISTA (Viewpoint-based Image selection with Semantic Task
Awareness), an active exploration method for robots to plan informative
trajectories that improve 3D map quality in areas most relevant for task
completion. Given an open-vocabulary search instruction (e.g., "find a
person"), VISTA enables a robot to explore its environment to search for the
object of interest, while simultaneously building a real-time semantic 3D
Gaussian Splatting reconstruction of the scene. The robot navigates its
environment by planning receding-horizon trajectories that prioritize semantic
similarity to the query and exploration of unseen regions of the environment.
To evaluate trajectories, VISTA introduces a novel, efficient
viewpoint-semantic coverage metric that quantifies both the geometric view
diversity and task relevance in the 3D scene. On static datasets, our coverage
metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in
computation speed and reconstruction quality. In quadrotor hardware
experiments, VISTA achieves 6x higher success rates in challenging maps,
compared to baseline methods, while matching baseline performance in less
challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying
it on a quadrotor drone and a Spot quadruped robot. Open-source code will be
released upon acceptance of the paper.