Decoding split-frequency representation for cross-scale tracking.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: May 22, 2025

Abstract

Learning tailored target representations for tracking is a promising direction in visual object tracking. Most state-of-the-art methods utilize autoencoders to generate representations by reconstructing the target's appearance. However, these reconstructions are often augmented to mimic scale jitter and alteration, neglecting physical scale observations such as those in aerial videos. This article addresses the challenge of representation learning for cross-scale tracking in generalized scenarios. Specifically, we incorporate target scale directly into the positional encoding, indicating scale through relative pixel density rather than the conventional metric of image resolution. This scale-aware encoding is then integrated into the proposed asymptotic hierarchy of decoders, designed to reconstruct representations by emphasizing the restoration of high- and low-frequency features at large and tiny scales. The reconstruction process is guided by supervised learning using split losses, enabling the generation of robust cross-scale representations for generic objects. Extensive experiments on six benchmarks - GOT-10k, LaSOT, TrackingNet, DTB70, UAV123, and TNL2K - validate the superior performance of our method. Additionally, our tracker achieves a remarkable speed of 123 frames per second on a Graphics Processing Unit, surpassing the previous best autoencoder-based tracker. The code and raw results will be made publicly available at: https://github.com/pellab/DSC.

Authors

Yuanming Zhang

State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin, 150001, China; Research Institute of Intelligent Control and Systems, Harbin Institute of Technology, Harbin, 150001, China.
Hao Sun

Department of Gastrointestinal Surgery, Harbin Medical University Cancer Hospital, Harbin, China.

Keywords

Algorithms Humans Image Processing, Computer-Assisted Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40440830)

Decoding split-frequency representation for cross-scale tracking.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Decoding split-frequency representation for cross-scale tracking.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals