CNN-Based Multiple Path Search for Action Tube Detection in Videos
This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiti...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 30; no. 1; pp. 104 - 116 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents an effective two-stream convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. A novel video localization refinement (VLR) scheme is first addressed to iteratively rectify the potentially inaccurate bounding boxes by exploiting the temporal consistency between adjacent frames. Then, to provide more faithful detection scores, a new fusion strategy is considered, which combines not only the appearance and the flow information of the two-stream networks but also the motion saliency, the latter of which is included to address the small camera motion. In addition, an efficient multiple path search (MPS) algorithm is developed to simultaneously identify multiple paths in a single run. In the forward message passing of MPS, each node stores information of a prescribed number of connections based on the accumulated scores determined in the previous stages. A backward path tracing is invoked afterward to find all multiple paths at the same time by fully reusing the information generated in the forward pass without repeating the search process. Thus, the complexity incurred can be reduced. The simulation results show that, together with VLR and the new fusion scheme, the proposed MPS, in general, can provide superior performance compared with the state-of-the-art works on four public datasets. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2018.2887283 |