Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation
Journal:
arXiv
Published Date:
Mar 12, 2025
Abstract
Neural reconstruction models for autonomous driving simulation have made
significant strides in recent years, with dynamic models becoming increasingly
prevalent. However, these models are typically limited to handling in-domain
objects closely following their original trajectories. We introduce a hybrid
approach that combines the strengths of neural reconstruction with
physics-based rendering. This method enables the virtual placement of
traditional mesh-based dynamic agents at arbitrary locations, adjustments to
environmental conditions, and rendering from novel camera viewpoints. Our
approach significantly enhances novel view synthesis quality -- especially for
road surfaces and lane markings -- while maintaining interactive frame rates
through our novel training method, NeRF2GS. This technique leverages the
superior generalization capabilities of NeRF-based methods and the real-time
rendering speed of 3D Gaussian Splatting (3DGS). We achieve this by training a
customized NeRF model on the original images with depth regularization derived
from a noisy LiDAR point cloud, then using it as a teacher model for 3DGS
training. This process ensures accurate depth, surface normals, and camera
appearance modeling as supervision. With our block-based training
parallelization, the method can handle large-scale reconstructions (greater
than or equal to 100,000 square meters) and predict segmentation masks, surface
normals, and depth maps. During simulation, it supports a rasterization-based
rendering backend with depth-based composition and multiple camera models for
real-time camera simulation, as well as a ray-traced backend for precise LiDAR
simulation.