Blockwise Temporal-Spatial Pathway Network

Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE International Conference on Image Processing (ICIP) pp. 3677 - 3681
Main Authors	Hong, SeulGi, Choi, Min-Kook
Format	Conference Proceeding
Language	English
Published	IEEE 19.09.2021
Subjects	3DCNN Action Recognition Adaptation models Feature extraction Fuses Image coding Image recognition Temporal-Spatial Representation Three-dimensional displays Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.
ISSN:	2381-8549
DOI:	10.1109/ICIP42928.2021.9506113