Rethinking the Faster R-CNN Architecture for Temporal Action Localization

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster RCNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accomm...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 1130 - 1139
Main Authors	Chao, Yu-Wei, Vijayanarasimhan, Sudheendra, Seybold, Bryan, Ross, David A., Deng, Jia, Sukthankar, Rahul
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2018
Subjects	Computer architecture Feature extraction Image segmentation Microsoft Windows Object detection Proposals Two dimensional displays
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster RCNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accommodate extreme variation in action durations; (2) we better exploit the temporal context of actions for both proposal generation and action classification by appropriately extending receptive fields; and (3) we explicitly consider multi-stream feature fusion and demonstrate that fusing motion late is important. We achieve state-of-the-art performance for both action proposal and localization on THUMOS'14 detection benchmark and competitive performance on ActivityNet challenge.
ISSN:	1063-6919
DOI:	10.1109/CVPR.2018.00124