Action Recognition Improved by Correlations and Attention of Subjects and Scene

Comprehensive activity understanding of multiple subjects in a video requires subject detection, action identification, and behavior interpretation as well as the interactions among subjects and background. This work develops the action recognition of subject(s) based on the correlations and interac...

Full description

Saved in:
Bibliographic Details
Published inVisual communications and image processing (Online) pp. 1 - 5
Main Authors Ha, Manh-Hung, Chen, Oscal Tzyh-Chiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 05.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Comprehensive activity understanding of multiple subjects in a video requires subject detection, action identification, and behavior interpretation as well as the interactions among subjects and background. This work develops the action recognition of subject(s) based on the correlations and interactions of the whole scene and subject(s) by using the Deep Neural Network (DNN). The proposed DNN consists of 3D Convolutional Neural Network (CNN), Spatial Attention (SA) generation layer, mapping convolutional fused-depth layer, Transformer Encoder (TE), and two fully connected layers with late fusion for final classification. Especially, the attention mechanisms in SA and TE are implemented to find out meaningful action information on spatial and temporal domains for enhancing recognition performance, respectively. The experimental results reveal that the proposed DNN shows the superior accuracies of 97.8%, 98.4% and 85.6% in the datasets of traffic police, UCF101-24 and JHMDB-21, respectively. Therefore, our DNN is an outstanding classifier for various action recognitions involving one or multiple subjects.
ISSN:2642-9357
DOI:10.1109/VCIP53242.2021.9675340