Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
With the rapid proliferation of 3D devices and the shortage of 3D content,
stereo conversion is attracting increasing attention. Recent works introduce
pretrained Diffusion Models (DMs) into this task. However, due to the scarcity
of large-scale training data and comprehensive benchmarks, the optimal
methodologies for employing DMs in stereo conversion and the accurate
evaluation of stereo effects remain largely unexplored. In this work, we
introduce the Mono2Stereo dataset, providing high-quality training data and
benchmark to support in-depth exploration of stereo conversion. With this
dataset, we conduct an empirical study that yields two primary findings. 1) The
differences between the left and right views are subtle, yet existing metrics
consider overall pixels, failing to concentrate on regions critical to stereo
effects. 2) Mainstream methods adopt either one-stage left-to-right generation
or warp-and-inpaint pipeline, facing challenges of degraded stereo effect and
image distortion respectively. Based on these findings, we introduce a new
evaluation metric, Stereo Intersection-over-Union, which prioritizes disparity
and achieves a high correlation with human judgments on stereo effect.
Moreover, we propose a strong baseline model, harmonizing the stereo effect and
image quality simultaneously, and notably surpassing current mainstream
methods. Our code and data will be open-sourced to promote further research in
stereo conversion. Our models are available at mono2stereo-bench.github.io.