Action Tube Extraction Based 3D-CNN for RGB-D Action Recognition
In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-...
Saved in:
Published in | 2018 International Conference on Content Based Multimedia Indexing (CBMI) pp. 1 - 6 |
---|---|
Main Authors | , , |
Format | Conference Proceeding Publication |
Language | English |
Published |
IEEE
01.09.2018
Institute of Electrical and Electronics Engineers (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. |
---|---|
ISBN: | 1538670216 9781538670217 |
DOI: | 10.1109/CBMI.2018.8516450 |