Deep Convolutional Correlation Filter Learning Toward Robust Visual Object Tracking

Recently, convolutional neural network has been pervasively adopted in visual object tracking for its potential in discriminating the target from the surrounding background. Most of the visual object trackers extract deep features from a specific layer, generally from the last convolutional layer. H...

Full description

Saved in:
Bibliographic Details
Published inChinese Control and Decision Conference pp. 4313 - 4320
Main Authors Bouraffa, Tayssir, Feng, Zihang, Wang, Yuxuan, Yan, Liping, Xia, Yuanqing, Xiao, Bo
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.08.2022
Subjects
Online AccessGet full text
ISSN1948-9447
DOI10.1109/CCDC55256.2022.10034306

Cover

Loading…
More Information
Summary:Recently, convolutional neural network has been pervasively adopted in visual object tracking for its potential in discriminating the target from the surrounding background. Most of the visual object trackers extract deep features from a specific layer, generally from the last convolutional layer. However, these trackers are less effective, especially when the target undergoes drastic appearance variations caused by the presence of different challenging situations, such as occlusion, illumination change, background clutter and so on. In this research paper, a novel tracking algorithm is developed by introducing an elastic net constraint and a contextual information into the convolutional network to successfully track the desired target throughout a video sequence. Hierarchical features are extracted from the shallow and the deep convolutional layers to further improve the tracking accuracy and robustness. As the deep convolutional layers capture important semantic information, they are more robust to the target appearance variations. As for the shallow convolutional layers, they encode significant spatial details, which are more accurate to precisely localize the desired target. Moreover, Peak-Strength Context-Aware correlation filters are embedded to each convolutional layer output that produce multi-level convolutional response maps to collaboratively identify the estimated position of the target in a coarse-to-fine manner. Quantitative and qualitative experiments are performed on the widely used benchmark, the OTB-2015 dataset that shows impressive results compared to the state-of-the-art trackers.
ISSN:1948-9447
DOI:10.1109/CCDC55256.2022.10034306