Hierarchical Tracking by Reinforcement Learning-Based Searching and Coarse-to-Fine Verifying

A class-agnostic tracker typically consists of three key components, i.e., its motion model, its target appearance model, and its updating strategy. However, most recent top-performing trackers mainly focus on constructing complicated appearance models and updating strategies, while using comparativ...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 28; no. 5; pp. 2331 - 2341
Main Authors Zhong, Bineng, Bai, Bing, Li, Jun, Zhang, Yulun, Fu, Yun
Format Journal Article
LanguageEnglish
Published United States IEEE 01.05.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A class-agnostic tracker typically consists of three key components, i.e., its motion model, its target appearance model, and its updating strategy. However, most recent top-performing trackers mainly focus on constructing complicated appearance models and updating strategies, while using comparatively simple and heuristic motion models that may result in an inefficient search and degrade the tracking performance. To address this issue, we propose a hierarchical tracker that learns to move and track based on the combination of data-driven search at the coarse level and coarse-to-fine verification at the fine level. At the coarse level, a data-driven motion model learned from deep recurrent reinforcement learning provides our tracker with coarse localization of an object. By formulating motion search as an action-decision problem in reinforcement learning, our tracker utilizes a recurrent convolutional neural network-based deep Q-network to effectively learn data-driven searching policies. The learned motion model can not only significantly reduce the search space but also provide more reliable interested regions for further verifying. At the fine level, a kernelized correlation filter (KCF)-based appearance model is adopted to densely yet efficiently verify a local region centered on the predicted location from the motion model. Through use of circulant matrices and fast Fourier transformation, a large number of candidate samples in the local region can be efficiently and effectively evaluated by the KCF-based appearance model. Finally, a simple yet robust estimator is designed to analyze possible tracking failure. The experiments on OTB50 and OTB100 illustrate that our tracker achieves better performance than the state-of-the-art trackers.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2018.2885238