OAENet: Oriented attention ensemble for accurate facial expression recognition

•We propose a Oriented Attention Enable Network (OAENet) architecture for FER, which aggreates ROI aware and attention mechanism, ensuring the sufficient utilization of both global and local features.•We propose a weighed mask that combines the facial landmarks and correlation coefficients coefficie...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 112; p. 107694
Main Authors Wang, Zhengning, Zeng, Fanwei, Liu, Shuaicheng, Zeng, Bing
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.04.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We propose a Oriented Attention Enable Network (OAENet) architecture for FER, which aggreates ROI aware and attention mechanism, ensuring the sufficient utilization of both global and local features.•We propose a weighed mask that combines the facial landmarks and correlation coefficients coefficients, which prove to be effective to improve the attention on local regions.•Our method has achieved state-of-the-art performances on several leading datasets such as Ck+, RAF-DB and AffectNet. Facial Expression Recognition (FER) is a challenging yet important research topic owing to its significance with respect to its academic and commercial potentials. In this work, we propose an oriented attention pseudo-siamese network that takes advantage of global and local facial information for high accurate FER. Our network consists of two branches, a maintenance branch that consisted of several convolutional blocks to take advantage of high-level semantic features, and an attention branch that possesses a UNet-like architecture to obtain local highlight information. Specifically, we first input the face image into the maintenance branch. For the attention branch, we calculate the correlation coefficient between a face and its sub-regions. Next, we construct a weighted mask by correlating the facial landmarks and the correlation coefficients. Then, the weighted mask is sent to the attention branch. Finally, the two branches are fused to output the classification results. As such, a direction-dependent attention mechanism is established to remedy the limitation of insufficient utilization of local information. With the help of our attention mechanism, our network not only grabs a global picture but can also concentrate on important local areas. Experiments are carried out on 4 leading facial expression datasets. Our method has achieved a very appealing performance compared to other state-of-the-art methods.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107694