Centerness-Aware Network for Temporal Action Proposal

Temporal action proposal generation aims at localizing the temporal segments containing human actions in a video. This work proposes a centerness-aware network (CAN), which is a novel one-stage approach intended to generate action proposals as keypoint triplets. A keypoint triplet contains two bound...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 5 - 16
Main Authors Liu, Yuan, Chen, Jingyuan, Chen, Xinpeng, Deng, Bing, Huang, Jianqiang, Hua, Xian-Sheng
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Temporal action proposal generation aims at localizing the temporal segments containing human actions in a video. This work proposes a centerness-aware network (CAN), which is a novel one-stage approach intended to generate action proposals as keypoint triplets. A keypoint triplet contains two boundary points (starting and ending) and one center point. Specifically, we evaluate the probabilities of each temporal location in the video whether it is at the boundaries or the center region of ground truth action proposals. CAN optimizes the predicted boundary points interactively in a bidirectional adaptation form by exploiting the dependencies among them. Furthermore, to accurately locate the center points of action proposals with different time spans, temporal feature pyramids are utilized to incorporate multi-scale information explicitly. Using the generated three keypoints, CAN efficiently retrieves temporal proposals by grouping keypoints into triplets if they are geometrically aligned. Experiments show that CAN achieves the state-of-the-art performance on the public THUMOS-14 and ActivityNet-1.3 datasets. Moreover, further experiments demonstrate that by applying action classifiers on proposals generated by CAN, our method achieves the state-of-the-art performance in temporal action localization.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2021.3075607