SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum
Journal:
arXiv
Published Date:
Dec 20, 2024
Abstract
We propose a new simulator, training approach, and policy architecture,
collectively called SOUS VIDE, for end-to-end visual drone navigation. Our
trained policies exhibit zero-shot sim-to-real transfer with robust real-world
performance using only onboard perception and computation. Our simulator,
called FiGS, couples a computationally simple drone dynamics model with a high
visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly
simulate drone flights producing photorealistic images at up to 130 fps. We use
FiGS to collect 100k-300k image/state-action pairs from an expert MPC with
privileged state and dynamics information, randomized over dynamics parameters
and spatial disturbances. We then distill this expert MPC into an end-to-end
visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net
processes color image, optical flow and IMU data streams into low-level thrust
and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a
learned module for low-level control that adapts at runtime to variations in
drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE
policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in
ambient brightness, shifting or removing objects from the scene, and people
moving aggressively through the drone's visual field. Code, data, and
experiment videos can be found on our project page:
https://stanfordmsl.github.io/SousVide/.