Add: Actionness-Pooled Deep-Convolutional Descriptor

Recognition of general actions has achieved great breakthroughs in recent years. However, in real-world applications, finer-grained action classification is often needed. The major challenge is that fine-grained actions usually share high similarities in both appearance and motion pattern, making it...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE International Conference on Multimedia and Expo (ICME) pp. 1 - 6
Main Authors Han, Tingting, Yao, Hongxun, Sun, Xiaoshuai, Xie, Wenlong, Zhang, Yanhao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recognition of general actions has achieved great breakthroughs in recent years. However, in real-world applications, finer-grained action classification is often needed. The major challenge is that fine-grained actions usually share high similarities in both appearance and motion pattern, making it difficult to distinguish them with existing general action representation. To solve this problem, we introduce visual attention mechanism into the proposed descriptor, termed as Actionness-pooled Deep-convolutional Descriptor (ADD). Instead of pooling features uniformly from the entire video, we aggregate features in sub-regions that are more likely to contain actions according to actionness maps, which endow ADD with the capability of capturing the subtle differences between fine-grained actions. We conduct experiments on HIT Dances dataset, one of the few existing datasets for fine-grained action analysis. Quantitative results have demonstrated that ADD remarkably outperforms traditional two-stream representation. Extensive experiments on two general action benchmarks, JHMDB and UCF101, have additionally proved that combining ADD with end-to-end ConvNet can further boost the recognition performance.
ISSN:1945-788X
DOI:10.1109/ICME.2018.8486535