Learning human-to-robot handovers through 3D scene reconstruction
Journal:
arXiv
Published Date:
Jul 11, 2025
Abstract
Learning robot manipulation policies from raw, real-world image data requires
a large number of robot-action trials in the physical environment. Although
training using simulations offers a cost-effective alternative, the visual
domain gap between simulation and robot workspace remains a major limitation.
Gaussian Splatting visual reconstruction methods have recently provided new
directions for robot manipulation by generating realistic environments. In this
paper, we propose the first method for learning supervised-based robot
handovers solely from RGB images without the need of real-robot training or
real-robot data collection. The proposed policy learner, Human-to-Robot
Handover using Sparse-View Gaussian Splatting (H2RH-SGS), leverages sparse-view
Gaussian Splatting reconstruction of human-to-robot handover scenes to generate
robot demonstrations containing image-action pairs captured with a camera
mounted on the robot gripper. As a result, the simulated camera pose changes in
the reconstructed scene can be directly translated into gripper pose changes.
We train a robot policy on demonstrations collected with 16 household objects
and {\em directly} deploy this policy in the real environment. Experiments in
both Gaussian Splatting reconstructed scene and real-world human-to-robot
handover experiments demonstrate that H2RH-SGS serves as a new and effective
representation for the human-to-robot handover task.