Dense Object Detection Based on De-homogenized Queries
Journal:
arXiv
Published Date:
Feb 11, 2025
Abstract
Dense object detection is widely used in automatic driving, video
surveillance, and other fields. This paper focuses on the challenging task of
dense object detection. Currently, detection methods based on greedy
algorithms, such as non-maximum suppression (NMS), often produce many
repetitive predictions or missed detections in dense scenarios, which is a
common problem faced by NMS-based algorithms. Through the end-to-end DETR
(DEtection TRansformer), as a type of detector that can incorporate the
post-processing de-duplication capability of NMS, etc., into the network, we
found that homogeneous queries in the query-based detector lead to a reduction
in the de-duplication capability of the network and the learning efficiency of
the encoder, resulting in duplicate prediction and missed detection problems.
To solve this problem, we propose learnable differentiated encoding to
de-homogenize the queries, and at the same time, queries can communicate with
each other via differentiated encoding information, replacing the previous
self-attention among the queries. In addition, we used joint loss on the output
of the encoder that considered both location and confidence prediction to give
a higher-quality initialization for queries. Without cumbersome decoder
stacking and guaranteeing accuracy, our proposed end-to-end detection framework
was more concise and reduced the number of parameters by about 8% compared to
deformable DETR. Our method achieved excellent results on the challenging
CrowdHuman dataset with 93.6% average precision (AP), 39.2% MR-2, and 84.3% JI.
The performance overperformed previous SOTA methods, such as Iter-E2EDet
(Progressive End-to-End Object Detection) and MIP (One proposal, Multiple
predictions). In addition, our method is more robust in various scenarios with
different densities.