Siamese network with a depthwise over-parameterized convolutional layer for visual tracking.

Journal: PloS one

Published Date: Aug 31, 2022

Abstract

Visual tracking is a fundamental research task in vision computer. It has broad application prospects, such as military defense and civil security. Visual tracking encounters many challenges in practical application, such as occlusion, fast motion and background clutter. Siamese based trackers achieve superior tracking performance in balanced accuracy and tracking speed. The deep feature extraction with Convolutional Neural Network (CNN) is an essential component in Siamese tracking framework. Although existing trackers take full advantage of deep feature information, the spatial structure and semantic information are not adequately exploited, which are helpful for enhancing target representations. The lack of these spatial and semantic information may lead to tracking drift. In this paper, we design a CNN feature extraction subnetwork based on a Depthwise Over-parameterized Convolutional layer (DO-Conv). A joint convolution method is introduced, namely the conventional and depthwise convolution. The depthwise convolution kernel explores independent channel information, which effectively extracts shallow spatial information and deep semantic information, and discards background information. Based on DO-Conv, we propose a novel tracking algorithm in Siamese framework (named DOSiam). Extensive experiments conducted on five benchmarks including OTB2015, VOT2016, VOT2018, GOT-10k and VOT2019-RGBT(TIR) show that the proposed DOSiam achieves leading tracking performance with real-time tracking speed at 60 FPS against state-of-the-art trackers.

Authors

Yuanyun Wang

School of Information Engineering, Nanchang Institute of Technology, Nanchang, Jiangxi, China.
Wenshuang Zhang

School of Information Engineering, Nanchang Institute of Technology, Nanchang, Jiangxi, China.
Limin Zhang

School of Information, University of Arizona, 1103 E. Second Street, Tucson, AZ 85705, USA.
Jun Wang

Department of Speech, Language, and Hearing Sciences and the Department of Neurology, The University of Texas at Austin, Austin, TX 78712, USA.

Keywords

Algorithms Motion Neural Networks, Computer Semantics

External Resources

View on PubMed Access via DOI PubMed (36044439)

Siamese network with a depthwise over-parameterized convolutional layer for visual tracking.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals