TransiT: Transient Transformer for Non-line-of-sight Videography
Journal:
arXiv
Published Date:
Mar 14, 2025
Abstract
High quality and high speed videography using Non-Line-of-Sight (NLOS)
imaging benefit autonomous navigation, collision prevention, and post-disaster
search and rescue tasks. Current solutions have to balance between the frame
rate and image quality. High frame rates, for example, can be achieved by
reducing either per-point scanning time or scanning density, but at the cost of
lowering the information density at individual frames. Fast scanning process
further reduces the signal-to-noise ratio and different scanning systems
exhibit different distortion characteristics. In this work, we design and
employ a new Transient Transformer architecture called TransiT to achieve
real-time NLOS recovery under fast scans. TransiT directly compresses the
temporal dimension of input transients to extract features, reducing
computation costs and meeting high frame rate requirements. It further adopts a
feature fusion mechanism as well as employs a spatial-temporal Transformer to
help capture features of NLOS transient videos. Moreover, TransiT applies
transfer learning to bridge the gap between synthetic and real-measured data.
In real experiments, TransiT manages to reconstruct from sparse transients of
$16 \times 16$ measured at an exposure time of 0.4 ms per point to NLOS videos
at a $64 \times 64$ resolution at 10 frames per second. We will make our code
and dataset available to the community.