STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene
Journal:
arXiv
Published Date:
Jun 29, 2025
Abstract
High-dynamic scene reconstruction aims to represent static background with
rigid spatial features and dynamic objects with deformed continuous
spatiotemporal features. Typically, existing methods adopt unified
representation model (e.g., Gaussian) to directly match the spatiotemporal
features of dynamic scene from frame camera. However, this unified paradigm
fails in the potential discontinuous temporal features of objects due to frame
imaging and the heterogeneous spatial features between background and objects.
To address this issue, we disentangle the spatiotemporal features into various
latent representations to alleviate the spatiotemporal mismatching between
background and objects. In this work, we introduce event camera to compensate
for frame camera, and propose a spatiotemporal-disentangled Gaussian splatting
framework for high-dynamic scene reconstruction. As for dynamic scene, we
figure out that background and objects have appearance discrepancy in
frame-based spatial features and motion discrepancy in event-based temporal
features, which motivates us to distinguish the spatiotemporal features between
background and objects via clustering. As for dynamic object, we discover that
Gaussian representations and event data share the consistent spatiotemporal
characteristic, which could serve as a prior to guide the spatiotemporal
disentanglement of object Gaussians. Within Gaussian splatting framework, the
cumulative scene-object disentanglement can improve the spatiotemporal
discrimination between background and objects to render the time-continuous
dynamic scene. Extensive experiments have been performed to verify the
superiority of the proposed method.