Magnitude-Orientation Stream network and depth information applied to activity recognition

•Magnitude by the depth weighting to circumvent problems related to camera distance.•Detailed revision of the literature with recently published works.•A study regarding the behavior of Magnitude-Orientation Stream (MOS). The temporal component of videos provides an important clue for activity recog...

Full description

Saved in:

Bibliographic Details
Published in	Journal of visual communication and image representation Vol. 63; p. 102596
Main Authors	Caetano, Carlos, de Melo, Victor H.C., Brémond, François, dos Santos, Jefersson A., Schwartz, William Robson
Format	Journal Article
Language	English
Published	Elsevier Inc 01.08.2019 Elsevier
Subjects	Activity recognition Computer Science Convolutional neural networks (CNNs) Depth information Image Processing Optical flow Spatiotemporal information Two-stream convolutional networks Depth information Activity recognition Convolutional neural networks (CNNs) Two-stream convolutional networks Spatiotemporal information Optical flow convolutional neural networks (CNNs) carlos.caetano@dcc.ufmg.br
Online Access	Get full text
ISSN	1047-3203 1095-9076
DOI	10.1016/j.jvcir.2019.102596

Cover

Loading…

More Information
Summary:	•Magnitude by the depth weighting to circumvent problems related to camera distance.•Detailed revision of the literature with recently published works.•A study regarding the behavior of Magnitude-Orientation Stream (MOS). The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the motion in a better and richer manner. Our method applies simple non-linear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Moreover, we also employ depth information to use as a weighting scheme on the magnitude information to compensate the distance of the subjects performing the activity to the camera. Experimental results, carried on two well-known datasets (UCF101 and NTU), demonstrate that using our proposed temporal stream as input to existing neural network architectures can improve their performance for activity recognition. Results demonstrate that our temporal stream provides complementary information able to improve the classical two-stream methods, indicating the suitability of our approach to be used as a temporal video representation.
ISSN:	1047-3203 1095-9076
DOI:	10.1016/j.jvcir.2019.102596