Robust Tracking via Combing Top-down and Bottom-up Attention

Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a re...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology p. 1
Main Authors	Li, Ning, Zhong, Bineng, Zheng, Yaozong, Liang, Qihua, Mo, Zhiyi, Song, Shuxiang
Format	Journal Article
Language	English
Published	IEEE 2024
Subjects	Bayes methods bottom-up Feature extraction Generators object tracking Target tracking Task analysis top-down Transformers vision transformer Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a result, the performance of the bottom-up attention based trackers is deteriorated in complicated scenes. To address this issue, we propose a robust tracker that combines bottom-up attention with top-down attention to comply with the existing ViT framework, named TBTrack. TBTrack can not only utilize the existing bottom-up attention mechanisms to model the long-range relationship of input tokens, but also utilize a newly added top-down attention mechanism to pay more attention to task related object and further eliminate interference from similar objects and backgrounds. Specifically, we firstly design a top-down prior generation module using an adaptive learning parameter combined with the template inputs to obtain top-down task guided signals. Then, we inject the prior signals into a bottom-up attention module to obtain a top-down and bottom-up attention combination block (TB-Block). Finally, we stack these TB-Blocks to construct our tracker (TBTrack) with top-down prior guidance capability, which focuses more on the task related object. Through extensive experiments, our TBTrack achieves impressive performance on multiple tracking benchmarks, including GOT-10k, LaSOT, LaSOT ext , TNL2K, TrackingNet, UAV123 and so on. The code and trained models will be publicly available.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3402436