Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks

In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neura...

Full description

Saved in:

Bibliographic Details
Published in	Future internet Vol. 10; no. 12; p. 115
Main Authors	Yang, Wanli, Chen, Yimin, Huang, Chen, Gao, Mingke
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.12.2018
Subjects	3D convolution Accuracy action recognition Artificial intelligence Artificial neural networks Classification CNN dense connectivity Human activity recognition Human behavior Human motion Information seeking behavior International conferences Internet Neural networks Object recognition Pattern recognition spatial pyramid pooling Success Surveillance Canada United States > US China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neural networks require a fixed size image input, which not only limits the network structure but also affects the recognition accuracy. Although this problem has been solved in the field of images, it has not yet been broken through in the field of video. To address the input problem of fixed size video frames in video recognition, we propose a three-dimensional (3D) densely connected convolutional network based on spatial pyramid pooling (3D-DenseNet-SPP). As the name implies, the network structure is mainly composed of three parts: 3DCNN, DenseNet, and SPPNet. Our models were evaluated on a KTH dataset and UCF101 dataset separately. The experimental results showed that our model has better performance in the field of video-based behavior recognition in comparison to the existing models.
ISSN:	1999-5903 1999-5903
DOI:	10.3390/fi10120115