LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera
Journal:
arXiv
Published Date:
Feb 20, 2025
Abstract
As the previous state-of-the-art 4D radar-camera fusion-based 3D object
detection method, LXL utilizes the predicted image depth distribution maps and
radar 3D occupancy grids to assist the sampling-based image view
transformation. However, the depth prediction lacks accuracy and consistency,
and the concatenation-based fusion in LXL impedes the model robustness. In this
work, we propose LXLv2, where modifications are made to overcome the
limitations and improve the performance. Specifically, considering the position
error in radar measurements, we devise a one-to-many depth supervision strategy
via radar points, where the radar cross section (RCS) value is further
exploited to adjust the supervision area for object-level depth consistency.
Additionally, a channel and spatial attention-based fusion module named
CSAFusion is introduced to improve feature adaptiveness. Experimental results
on the View-of-Delft and TJ4DRadSet datasets show that the proposed LXLv2 can
outperform LXL in detection accuracy, inference speed and robustness,
demonstrating the effectiveness of the model.