ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs
Journal:
arXiv
Published Date:
Jun 16, 2025
Abstract
Dental diagnosis relies on two primary imaging modalities: panoramic
radiographs (PX) providing 2D oral cavity representations, and Cone-Beam
Computed Tomography (CBCT) offering detailed 3D anatomical information. While
PX images are cost-effective and accessible, their lack of depth information
limits diagnostic accuracy. CBCT addresses this but presents drawbacks
including higher costs, increased radiation exposure, and limited
accessibility. Existing reconstruction models further complicate the process by
requiring CBCT flattening or prior dental arch information, often unavailable
clinically. We introduce ViT-NeBLa, a vision transformer-based Neural
Beer-Lambert model enabling accurate 3D reconstruction directly from single PX.
Our key innovations include: (1) enhancing the NeBLa framework with Vision
Transformers for improved reconstruction capabilities without requiring CBCT
flattening or prior dental arch information, (2) implementing a novel
horseshoe-shaped point sampling strategy with non-intersecting rays that
eliminates intermediate density aggregation required by existing models due to
intersecting rays, reducing sampling point computations by $52 \%$, (3)
replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior
global and local feature extraction, and (4) implementing learnable hash
positional encoding for better higher-dimensional representation of 3D sample
points compared to existing Fourier-based dense positional encoding.
Experiments demonstrate that ViT-NeBLa significantly outperforms prior
state-of-the-art methods both quantitatively and qualitatively, offering a
cost-effective, radiation-efficient alternative for enhanced dental
diagnostics.