PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation
Journal:
arXiv
Published Date:
Apr 23, 2025
Abstract
Accurate depth estimation enhances endoscopy navigation and diagnostics, but
obtaining ground-truth depth in clinical settings is challenging. Synthetic
datasets are often used for training, yet the domain gap limits generalization
to real data. We propose a novel image-to-image translation framework that
preserves structure while generating realistic textures from clinical data. Our
key innovation integrates Stable Diffusion with ControlNet, conditioned on a
latent representation extracted from a Per-Pixel Shading (PPS) map. PPS
captures surface lighting effects, providing a stronger structural constraint
than depth maps. Experiments show our approach produces more realistic
translations and improves depth estimation over GAN-based MI-CycleGAN. Our code
is publicly accessible at https://github.com/anaxqx/PPS-Ctrl.