OAENet: Oriented attention ensemble for accurate facial expression recognition
•We propose a Oriented Attention Enable Network (OAENet) architecture for FER, which aggreates ROI aware and attention mechanism, ensuring the sufficient utilization of both global and local features.•We propose a weighed mask that combines the facial landmarks and correlation coefficients coefficie...
Saved in:
Published in | Pattern recognition Vol. 112; p. 107694 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.04.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •We propose a Oriented Attention Enable Network (OAENet) architecture for FER, which aggreates ROI aware and attention mechanism, ensuring the sufficient utilization of both global and local features.•We propose a weighed mask that combines the facial landmarks and correlation coefficients coefficients, which prove to be effective to improve the attention on local regions.•Our method has achieved state-of-the-art performances on several leading datasets such as Ck+, RAF-DB and AffectNet.
Facial Expression Recognition (FER) is a challenging yet important research topic owing to its significance with respect to its academic and commercial potentials. In this work, we propose an oriented attention pseudo-siamese network that takes advantage of global and local facial information for high accurate FER. Our network consists of two branches, a maintenance branch that consisted of several convolutional blocks to take advantage of high-level semantic features, and an attention branch that possesses a UNet-like architecture to obtain local highlight information. Specifically, we first input the face image into the maintenance branch. For the attention branch, we calculate the correlation coefficient between a face and its sub-regions. Next, we construct a weighted mask by correlating the facial landmarks and the correlation coefficients. Then, the weighted mask is sent to the attention branch. Finally, the two branches are fused to output the classification results. As such, a direction-dependent attention mechanism is established to remedy the limitation of insufficient utilization of local information. With the help of our attention mechanism, our network not only grabs a global picture but can also concentrate on important local areas. Experiments are carried out on 4 leading facial expression datasets. Our method has achieved a very appealing performance compared to other state-of-the-art methods. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2020.107694 |