UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery
Journal:
arXiv
Published Date:
Jan 3, 2025
Abstract
Unmanned aerial vehicle object detection (UAV-OD) has been widely used in
various scenarios. However, most existing UAV-OD algorithms rely on manually
designed components, which require extensive tuning. End-to-end models that do
not depend on such manually designed components are mainly designed for natural
images, which are less effective for UAV imagery. To address such challenges,
this paper proposes an efficient detection transformer (DETR) framework
tailored for UAV imagery, i.e., UAV-DETR. The framework includes a multi-scale
feature fusion with frequency enhancement module, which captures both spatial
and frequency information at different scales. In addition, a frequency-focused
down-sampling module is presented to retain critical spatial details during
down-sampling. A semantic alignment and calibration module is developed to
align and fuse features from different fusion paths. Experimental results
demonstrate the effectiveness and generalization of our approach across various
UAV imagery datasets. On the VisDrone dataset, our method improves AP by 3.1\%
and $\text{AP}_{50}$ by 4.2\% over the baseline. Similar enhancements are
observed on the UAVVaste dataset. The project page:
https://github.com/ValiantDiligent/UAV-DETR