Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding

Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient l...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems Vol. E99.D; no. 4; pp. 1212 - 1220
Main Authors KATO, Jien, WANG, Yu
Format Journal Article
LanguageEnglish
Published The Institute of Electronics, Information and Communication Engineers 01.04.2016
Subjects
Online AccessGet full text
ISSN0916-8532
1745-1361
DOI10.1587/transinf.2015EDP7333

Cover

More Information
Summary:Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the sparse codes for a large collection of prototype local feature descriptors in the off-line learning phase using Sparse Coding (SC) and look up the nearest prototype's precomputed sparse code for each to-be-encoded local feature in the encoding phase using Approximate Nearest Neighbour (ANN) search. It shares the low dimensionality of SC and the high speed of ANN, which are both desired properties for a local feature encoding approach. ASC has been excessively evaluated on the KTH dataset and the HMDB51 dataset. We confirmed that it is able to encode large quantity of local video features into discriminative low dimensional representations efficiently.
ISSN:0916-8532
1745-1361
DOI:10.1587/transinf.2015EDP7333