Robust Tracking via Combing Top-down and Bottom-up Attention
Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a re...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology p. 1 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a result, the performance of the bottom-up attention based trackers is deteriorated in complicated scenes. To address this issue, we propose a robust tracker that combines bottom-up attention with top-down attention to comply with the existing ViT framework, named TBTrack. TBTrack can not only utilize the existing bottom-up attention mechanisms to model the long-range relationship of input tokens, but also utilize a newly added top-down attention mechanism to pay more attention to task related object and further eliminate interference from similar objects and backgrounds. Specifically, we firstly design a top-down prior generation module using an adaptive learning parameter combined with the template inputs to obtain top-down task guided signals. Then, we inject the prior signals into a bottom-up attention module to obtain a top-down and bottom-up attention combination block (TB-Block). Finally, we stack these TB-Blocks to construct our tracker (TBTrack) with top-down prior guidance capability, which focuses more on the task related object. Through extensive experiments, our TBTrack achieves impressive performance on multiple tracking benchmarks, including GOT-10k, LaSOT, LaSOT ext , TNL2K, TrackingNet, UAV123 and so on. The code and trained models will be publicly available. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3402436 |