AF-Transformer: Attention Fusion Transformer for Facial Expression Recognition
Due to occlusions, variable head positions, face deformations, and motion blur under unconstrained situations, facial expression recognition (FER) in the wild is exceedingly difficult. Previous studies were mostly developed for laboratory-controlled FER, despite significant development in automated...
Saved in:
Published in | 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA) pp. 939 - 942 |
---|---|
Main Author | |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.05.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Due to occlusions, variable head positions, face deformations, and motion blur under unconstrained situations, facial expression recognition (FER) in the wild is exceedingly difficult. Previous studies were mostly developed for laboratory-controlled FER, despite significant development in automated FER over the last few decades. Real-world occlusions, varied head positions, and other difficulties significantly raise the difficulty of FER due to these poorly informed regions and complicated backdrops. Unlike prior exclusively CNN-based approaches, we think that converting face pictures into visual word sequences and performing global expression recognition is viable and practicable. As a result, we develop Attention Fusion Transformer (AF-Transformer) as a two-step solution to FER in the wild. To begin, we suggest using Attention Fusion (AF) to combine two feature maps produced by a dual-branch CNN. By combining numerous characteristics with global-local attention, AF collects discriminative information. Flattening and projecting the fused feature maps into visual word sequences follows. Second, we suggest modelling the link between these visual words and global self-attention, inspired by the success of Transformers in natural language processing. Three publicly available in-the-wild facial expression datasets are used to test the suggested approach. Extensive studies show that our technique outperforms other methods in the same context. |
---|---|
DOI: | 10.1109/CVIDLICCEA56201.2022.9824452 |