Context-aware Cascade Attention-based RNN for Video Emotion Recognition
Emotion recognition can provide crucial information about the user in many applications when building human-computer interaction (HCI) systems. Most of current researches on visual emotion recognition are focusing on exploring facial features. However, context information including surrounding envir...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
30.05.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Emotion recognition can provide crucial information about the user in many
applications when building human-computer interaction (HCI) systems. Most of
current researches on visual emotion recognition are focusing on exploring
facial features. However, context information including surrounding environment
and human body can also provide extra clues to recognize emotion more
accurately. Inspired by "sequence to sequence model" for neural machine
translation, which models input and output sequences by an encoder and a
decoder in recurrent neural network (RNN) architecture respectively, a novel
architecture, "CACA-RNN", is proposed in this work. The proposed network
consists of two RNNs in a cascaded architecture to process both context and
facial information to perform video emotion classification. Results of the
model were submitted to video emotion recognition sub-challenge in Multimodal
Emotion Recognition Challenge (MEC2017). CACA-RNN outperforms the MEC2017
baseline (mAP of 21.7%): it achieved mAP of 45.51% on the testing set in the
video only challenge. |
---|---|
DOI: | 10.48550/arxiv.1805.12098 |