CNN-Based Multiple Path Search for Action Tube Detection in Videos

This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiti...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 30; no. 1; pp. 104 - 116
Main Authors	Alwando, Erick Hendra Putra, Chen, Yie-Tarng, Fang, Wen-Hsien
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Action localization Algorithms Artificial neural networks Complexity theory Computer simulation convolutional neural networks (CNN) Detectors Electron tubes Feature extraction localization refinement Message passing multiple path search Object detection Proposals Search process Tubes Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiting the temporal consistency between adjacent frames. Then, to provide more faithful detection scores, a new fusion strategy is considered, which combines not only the appearance and the flow information of the two-stream networks but also the motion saliency, the latter of which is included to address the small camera motion. In addition, an efficient multiple path search (MPS) algorithm is developed to simultaneously identify multiple paths in a single run. In the forward message passing of MPS, each node stores information of a prescribed number of connections based on the accumulated scores determined in the previous stages. A backward path tracing is invoked afterward to find all multiple paths at the same time by fully reusing the information generated in the forward pass without repeating the search process. Thus, the complexity incurred can be reduced. The simulation results show that, together with VLR and the new fusion scheme, the proposed MPS, in general, can provide superior performance compared with the state-of-the-art works on four public datasets.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2018.2887283