Context-aware Cascade Attention-based RNN for Video Emotion Recognition

Emotion recognition can provide crucial information about the user in many applications when building human-computer interaction (HCI) systems. Most of current researches on visual emotion recognition are focusing on exploring facial features. However, context information including surrounding envir...

Full description

Saved in:

Bibliographic Details
Main Authors	Sun, Man-Chin, Hsu, Shih-Huan, Yang, Min-Chun, Chien, Jen-Hsien
Format	Journal Article
Language	English
Published	30.05.2018
Subjects	Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Emotion recognition can provide crucial information about the user in many applications when building human-computer interaction (HCI) systems. Most of current researches on visual emotion recognition are focusing on exploring facial features. However, context information including surrounding environment and human body can also provide extra clues to recognize emotion more accurately. Inspired by "sequence to sequence model" for neural machine translation, which models input and output sequences by an encoder and a decoder in recurrent neural network (RNN) architecture respectively, a novel architecture, "CACA-RNN", is proposed in this work. The proposed network consists of two RNNs in a cascaded architecture to process both context and facial information to perform video emotion classification. Results of the model were submitted to video emotion recognition sub-challenge in Multimodal Emotion Recognition Challenge (MEC2017). CACA-RNN outperforms the MEC2017 baseline (mAP of 21.7%): it achieved mAP of 45.51% on the testing set in the video only challenge.
DOI:	10.48550/arxiv.1805.12098