Centerness-Aware Network for Temporal Action Proposal

Temporal action proposal generation aims at localizing the temporal segments containing human actions in a video. This work proposes a centerness-aware network (CAN), which is a novel one-stage approach intended to generate action proposals as keypoint triplets. A keypoint triplet contains two bound...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 5 - 16
Main Authors	Liu, Yuan, Chen, Jingyuan, Chen, Xinpeng, Deng, Bing, Huang, Jianqiang, Hua, Xian-Sheng
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Detectors Feature extraction interactive boundary detector multi-scale center detector Object detection Proposals Pyramids Task analysis Temporal action proposal Video sequences Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Temporal action proposal generation aims at localizing the temporal segments containing human actions in a video. This work proposes a centerness-aware network (CAN), which is a novel one-stage approach intended to generate action proposals as keypoint triplets. A keypoint triplet contains two boundary points (starting and ending) and one center point. Specifically, we evaluate the probabilities of each temporal location in the video whether it is at the boundaries or the center region of ground truth action proposals. CAN optimizes the predicted boundary points interactively in a bidirectional adaptation form by exploiting the dependencies among them. Furthermore, to accurately locate the center points of action proposals with different time spans, temporal feature pyramids are utilized to incorporate multi-scale information explicitly. Using the generated three keypoints, CAN efficiently retrieves temporal proposals by grouping keypoints into triplets if they are geometrically aligned. Experiments show that CAN achieves the state-of-the-art performance on the public THUMOS-14 and ActivityNet-1.3 datasets. Moreover, further experiments demonstrate that by applying action classifiers on proposals generated by CAN, our method achieves the state-of-the-art performance in temporal action localization.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2021.3075607