TENet: Targetness entanglement incorporating with multi-scale pooling and mutually-guided fusion for RGB-E object tracking.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intrinsic characteristics of the event data. To address this problem, we propose an Event backbone (Pooler), designed to obtain a high-quality feature representation that is cognisant of the innate characteristics of the event data, namely its sparsity. In particular, Multi-Scale Pooling is introduced to capture all the motion feature trends within event data through the utilisation of diverse pooling kernel sizes. The association between the derived RGB and event representations is established by an innovative module performing adaptive Mutually Guided Fusion (MGF). Extensive experimental results show that our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets, including VisEvent and COESOT, where the precision and success rates on COESOT are improved by 4.9% and 5.2%, respectively. Our code will be available at https://github.com/SSSpc333/TENet.

Authors

  • Pengcheng Shao
    Josef Kittler Research Institute on Artificial Intelligence, China; Sino-UK Joint Laboratory on Artificial Intelligence, Ministry of Science and Technology, China; International Joint Laboratory on Artificial Intelligence, Ministry of Education, China; International Joint Laboratory on Artificial Intelligence, Jiangsu Province, China; School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
  • Tianyang Xu
    School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China.
  • Zhangyong Tang
    Josef Kittler Research Institute on Artificial Intelligence, China; Sino-UK Joint Laboratory on Artificial Intelligence, Ministry of Science and Technology, China; International Joint Laboratory on Artificial Intelligence, Ministry of Education, China; International Joint Laboratory on Artificial Intelligence, Jiangsu Province, China; School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
  • Linze Li
    College of Mechanical and Electrical Engineering, Henan Agricultural University, Zhengzhou, 450002, China.
  • Xiao-Jun Wu
    Shandong Provincial Key Laboratory of Network based Intelligent Computing, University of Jinan, Jinan 250022, China. Electronic address: wu_xiaojun@jiangnan.edu.cn.
  • Josef Kittler
    Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, GU2 7XH, United Kingdom. Electronic address: j.kittler@surrey.ac.uk.