Prior-enhanced Temporal Action Localization using Subject-aware Spatial Attention
Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video. Current approaches treat video frames homogeneously, and tend to give background and key objects excessive attention. This limits their sensitivity to localize act...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.11.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2211.05299 |
Cover
Loading…
Summary: | Temporal action localization (TAL) aims to detect the boundary and identify
the class of each action instance in a long untrimmed video. Current approaches
treat video frames homogeneously, and tend to give background and key objects
excessive attention. This limits their sensitivity to localize action
boundaries. To this end, we propose a prior-enhanced temporal action
localization method (PETAL), which only takes in RGB input and incorporates
action subjects as priors. This proposal leverages action subjects' information
with a plug-and-play subject-aware spatial attention module (SA-SAM) to
generate an aggregated and subject-prioritized representation. Experimental
results on THUMOS-14 and ActivityNet-1.3 datasets demonstrate that the proposed
PETAL achieves competitive performance using only RGB features, e.g., boosting
mAP by 2.41% or 0.25% over the state-of-the-art approach that uses RGB features
or with additional optical flow features on the THUMOS-14 dataset. |
---|---|
DOI: | 10.48550/arxiv.2211.05299 |