Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition

•A deep-based model for solving facial expression recognition is proposed.•Propose a Salient Expression Region Descriptor (SERD) to locate expression-sensitive regions and refine deep features.•Construct an encoder-decoder framework to disentangle multiple variations (e.g. ages, races, poses, etc.)...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 92; pp. 177 - 191
Main Authors	Xie, Siyue, Hu, Haifeng, Wu, Yongbo
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.08.2019
Subjects	Attention Convolutional neural network Facial expression recognition Multi-Path variation-suppressing network Salient expressional region descriptor Facial expression recognition Multi-Path variation-suppressing network Salient expressional region descriptor Convolutional neural network Attention
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A deep-based model for solving facial expression recognition is proposed.•Propose a Salient Expression Region Descriptor (SERD) to locate expression-sensitive regions and refine deep features.•Construct an encoder-decoder framework to disentangle multiple variations (e.g. ages, races, poses, etc.) from features.•Experiments on six famous expression databases verify the effectiveness of the proposed model. Facial Expression Recognition (FER) has long been a challenging task in the field of computer vision. In this paper, we present a novel model, named Deep Attentive Multi-path Convolutional Neural Network (DAM-CNN), for FER. Different from most existing models, DAM-CNN can automatically locate expression-related regions in an expressional image and yield a robust image representation for FER. The proposed model contains two novel modules: an attention-based Salient Expressional Region Descriptor (SERD) and the Multi-Path Variation-Suppressing Network (MPVS-Net). SERD can adaptively estimate the importance of different image regions for FER task, while MPVS-Net disentangles expressional information from irrelevant variations. By jointly combining SERD and MPVS-Net, DAM-CNN is able to highlight expression-relevant features and generate a variation-robust representation for expression classification. Extensive experimental results on both constrained datasets (CK+, JAFFE, TFEID) and unconstrained datasets (SFEW, FER2013, BAUM-2i) demonstrate the effectiveness of our DAM-CNN model.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2019.03.019