M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion

Journal: arXiv

Published Date: May 22, 2025

Abstract

We tackle the problem of monocular-to-stereo video conversion and propose a novel architecture for inpainting and refinement of the warped right view obtained by depth-based reprojection of the input left view. We extend the Stable Video Diffusion (SVD) model to utilize the input left video, the warped right video, and the disocclusion masks as conditioning input to generate a high-quality right camera view. In order to effectively exploit information from neighboring frames for inpainting, we modify the attention layers in SVD to compute full attention for discoccluded pixels. Our model is trained to generate the right view video in an end-to-end manner by minimizing image space losses to ensure high-quality generation. Our approach outperforms previous state-of-the-art methods, obtaining an average rank of 1.43 among the 4 compared methods in a user study, while being 6x faster than the second placed method.

Authors

Nina Shvetsova
Goutam Bhat
Prune Truong
Hilde Kuehne
Federico Tombari

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.16565v1)

M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals