DPBridge: Latent Diffusion Bridge for Dense Prediction
Journal:
arXiv
Published Date:
Dec 29, 2024
Abstract
Diffusion models demonstrate remarkable capabilities in capturing complex
data distributions and have achieved compelling results in many generative
tasks. While they have recently been extended to dense prediction tasks such as
depth estimation and surface normal prediction, their full potential in this
area remains under-explored. In dense prediction settings, target signal maps
and input images are pixel-wise aligned. This makes conventional noise-to-data
generation paradigm inefficient, as input images can serve as more informative
prior compared to pure noise. Diffusion bridge models, which support
data-to-data generation between two general data distributions, offer a
promising alternative, but they typically fail to exploit the rich visual
priors embedded in large pretrained foundation models. To address these
limitations, we integrate diffusion bridge formulation with structured visual
priors and introduce DPBridge, the first latent diffusion bridge framework for
dense prediction tasks. Our method presents three key contributions: (1) a
tractable reverse transition kernel for diffusion bridge process, enabling
maximum likelihood training scheme for better compatibility with pretrained
backbones; (2) a distribution-aligned normalization technique to mitigate the
discrepancies between the bridge and standard diffusion processes; and (3) an
auxiliary image consistency loss to preserve fine-grained details. Experiments
across extensive benchmarks validate that our method consistently achieves
superior performance, demonstrating its effectiveness and generalization
capability under different scenarios.