CNN-Based Multiple Path Search for Action Tube Detection in Videos

This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiti...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 30; no. 1; pp. 104 - 116
Main Authors Alwando, Erick Hendra Putra, Chen, Yie-Tarng, Fang, Wen-Hsien
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiting the temporal consistency between adjacent frames. Then, to provide more faithful detection scores, a new fusion strategy is considered, which combines not only the appearance and the flow information of the two-stream networks but also the motion saliency, the latter of which is included to address the small camera motion. In addition, an efficient multiple path search (MPS) algorithm is developed to simultaneously identify multiple paths in a single run. In the forward message passing of MPS, each node stores information of a prescribed number of connections based on the accumulated scores determined in the previous stages. A backward path tracing is invoked afterward to find all multiple paths at the same time by fully reusing the information generated in the forward pass without repeating the search process. Thus, the complexity incurred can be reduced. The simulation results show that, together with VLR and the new fusion scheme, the proposed MPS, in general, can provide superior performance compared with the state-of-the-art works on four public datasets.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2018.2887283