ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Recent advancements in generative methods, especially diffusion models, have
made great progress in remote sensing image synthesis. Despite these
advancements, existing methods have not explored the simulation of future
scenarios based on given scenario images. This simulation capability has wide
applications for urban planning, land managementChangeBridge: Spatiotemporal
Image Generation with Multimodal Controls, and beyond. In this work, we propose
ChangeBridge, a conditional spatiotemporal diffusion model. Given pre-event
images and conditioned on multimodal spatial controls (e.g., text prompts,
instance layouts, and semantic maps), ChangeBridge can synthesize post-event
images. The core idea behind ChangeBridge is to modeling the noise-to-image
diffusion model, as a pre-to-post diffusion bridge. Conditioned on multimodal
controls, ChangeBridge leverages a stochastic Brownian-bridge diffusion,
directly modeling the spatiotemporal evolution between pre-event and post-event
states. To the best of our knowledge, ChangeBridge is the first spatiotemporal
generative model with multimodal controls for remote sensing. Experimental
results demonstrate that ChangeBridge can simulate high-fidelity future
scenarios aligned with given conditions, including event and event-driven
background variations. Code will be available.