Robust Tracking via Combing Top-down and Bottom-up Attention

Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a re...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology p. 1
Main Authors Li, Ning, Zhong, Bineng, Zheng, Yaozong, Liang, Qihua, Mo, Zhiyi, Song, Shuxiang
Format Journal Article
LanguageEnglish
Published IEEE 2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a result, the performance of the bottom-up attention based trackers is deteriorated in complicated scenes. To address this issue, we propose a robust tracker that combines bottom-up attention with top-down attention to comply with the existing ViT framework, named TBTrack. TBTrack can not only utilize the existing bottom-up attention mechanisms to model the long-range relationship of input tokens, but also utilize a newly added top-down attention mechanism to pay more attention to task related object and further eliminate interference from similar objects and backgrounds. Specifically, we firstly design a top-down prior generation module using an adaptive learning parameter combined with the template inputs to obtain top-down task guided signals. Then, we inject the prior signals into a bottom-up attention module to obtain a top-down and bottom-up attention combination block (TB-Block). Finally, we stack these TB-Blocks to construct our tracker (TBTrack) with top-down prior guidance capability, which focuses more on the task related object. Through extensive experiments, our TBTrack achieves impressive performance on multiple tracking benchmarks, including GOT-10k, LaSOT, LaSOT ext , TNL2K, TrackingNet, UAV123 and so on. The code and trained models will be publicly available.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3402436