Integration of Global and Local Knowledge for Foreground Enhancing in Weakly Supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization (WTAL) aims to identify the temporal duration of actions and classify the action categories with only video-level labels in the training stage. Motivated by the intuition that the attention maps generated from various views will assist in enhancing the...
Saved in:
Published in | IEEE transactions on multimedia Vol. 26; pp. 8476 - 8487 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Weakly Supervised Temporal Action Localization (WTAL) aims to identify the temporal duration of actions and classify the action categories with only video-level labels in the training stage. Motivated by the intuition that the attention maps generated from various views will assist in enhancing the foreground action temporal segments, in this paper we propose a WTAL pipeline based on a novel attention mechanism that effectively integrates global and local knowledge. Our attention mechanism is mainly composed of a global attention branch and a local attention branch. Specifically, the global attention branch is built on the inter-segment similarity to sparsely mine out the correlation knowledge within the entire video, while the local attention branch is built on the convolutional structure to densely aggregate the information within the fixed local respective field. Experiments on THUMOS14 and ActivityNet v1.3 datasets demonstrate the effectiveness of our proposed WTAL pipeline compared to state-of-the-art methods. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2024.3379887 |