High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
Unmanned Aerial Vehicle-based Object Detection (UAV-OD) faces substantial
challenges, including small target sizes, high-density distributions, and
cluttered backgrounds in UAV imagery. Current algorithms often depend on
hand-crafted components like anchor boxes, which demand fine-tuning and exhibit
limited generalization, and Non-Maximum Suppression (NMS), which is
threshold-sensitive and prone to misclassifying dense objects. These generic
architectures thus struggle to adapt to aerial imaging characteristics,
resulting in performance limitations. Moreover, emerging end-to-end frameworks
have yet to effectively mitigate these aerial-specific challenges.To address
these issues, we propose HEGS-DETR, a comprehensively enhanced, real-time
Detection Transformer framework tailored for UAVs. First, we introduce the
High-Frequency Enhanced Semantics Network (HFESNet) as a novel backbone.
HFESNet preserves critical high-frequency spatial details to extract robust
semantic features, thereby improving discriminative capability for small and
occluded targets in complex backgrounds. Second, our Efficient Small Object
Pyramid (ESOP) strategy strategically fuses high-resolution feature maps with
minimal computational overhead, significantly boosting small object detection.
Finally, the proposed Selective Query Recollection (SQR) and Geometry-Aware
Positional Encoding (GAPE) modules enhance the detector's decoder stability and
localization accuracy, effectively optimizing bounding boxes and providing
explicit spatial priors for dense scenes. Experiments on the VisDrone dataset
demonstrate that HEGS-DETR achieves a 5.1% AP50 and 3.8% AP increase over the
baseline, while maintaining real-time speed and reducing parameter count by 4M.