DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving
Journal:
arXiv
Published Date:
May 26, 2025
Abstract
Camera sensor simulation serves as a critical role for autonomous driving
(AD), e.g. evaluating vision-based AD algorithms. While existing approaches
have leveraged generative models for controllable image/video generation, they
remain constrained to generating multi-view video sequences with fixed camera
viewpoints and video frequency, significantly limiting their downstream
applications. To address this, we present a generalizable camera simulation
framework DriveCamSim, whose core innovation lies in the proposed Explicit
Camera Modeling (ECM) mechanism. Instead of implicit interaction through
vanilla attention, ECM establishes explicit pixel-wise correspondences across
multi-view and multi-frame dimensions, decoupling the model from overfitting to
the specific camera configurations (intrinsic/extrinsic parameters, number of
views) and temporal sampling rates presented in the training data. For
controllable generation, we identify the issue of information loss inherent in
existing conditional encoding and injection pipelines, proposing an
information-preserving control mechanism. This control mechanism not only
improves conditional controllability, but also can be extended to be
identity-aware to enhance temporal consistency in foreground object rendering.
With above designs, our model demonstrates superior performance in both visual
quality and controllability, as well as generalization capability across
spatial-level (camera parameters variations) and temporal-level (video frame
rate variations), enabling flexible user-customizable camera simulation
tailored to diverse application scenarios. Code will be avaliable at
https://github.com/swc-17/DriveCamSim for facilitating future research.