The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
Journal:
arXiv
Published Date:
Jun 11, 2025
Abstract
We consider the problem of generalizable novel view synthesis (NVS), which
aims to generate photorealistic novel views from sparse or even unposed 2D
images without per-scene optimization. This task remains fundamentally
challenging, as it requires inferring 3D structure from incomplete and
ambiguous 2D observations. Early approaches typically rely on strong 3D
knowledge, including architectural 3D inductive biases (e.g., embedding
explicit 3D representations, such as NeRF or 3DGS, into network design) and
ground-truth camera poses for both input and target views. While recent efforts
have sought to reduce the 3D inductive bias or the dependence on known camera
poses of input views, critical questions regarding the role of 3D knowledge and
the necessity of circumventing its use remain under-explored. In this work, we
conduct a systematic analysis on the 3D knowledge and uncover a critical trend:
the performance of methods that requires less 3D knowledge accelerates more as
data scales, eventually achieving performance on par with their 3D
knowledge-driven counterparts, which highlights the increasing importance of
reducing dependence on 3D knowledge in the era of large-scale data. Motivated
by and following this trend, we propose a novel NVS framework that minimizes 3D
inductive bias and pose dependence for both input and target views. By
eliminating this 3D knowledge, our method fully leverages data scaling and
learns implicit 3D awareness directly from sparse 2D images, without any 3D
inductive bias or pose annotation during training. Extensive experiments
demonstrate that our model generates photorealistic and 3D-consistent novel
views, achieving even comparable performance with methods that rely on posed
inputs, thereby validating the feasibility and effectiveness of our
data-centric paradigm. Project page:
https://pku-vcl-geometry.github.io/Less3Depend/ .