POST: Prototype‐oriented similarity transfer framework for cross‐domain facial expression recognition

Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross‐domain FER applications when applied to different datasets. FER under cross‐dataset also suffers...

Full description

Saved in:

Bibliographic Details
Published in	Computer animation and virtual worlds Vol. 35; no. 3
Main Authors	Guo, Zhe, Wei, Bingxin, Cai, Qinglin, Liu, Jiayi, Wang, Yi
Format	Journal Article
Language	English
Published	Chichester Wiley Subscription Services, Inc 01.05.2024
Subjects	bidirectional cross‐attention Computer vision Datasets Face recognition facial expression recognition learnable category prototypes Prototypes Resampling Similarity similarity transfer Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross‐domain FER applications when applied to different datasets. FER under cross‐dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype‐oriented similarity transfer framework (POST) for cross‐domain FER. The bidirectional cross‐attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross‐domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross‐domain and specific domain features. We further introduce the self‐training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF‐DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state‐of‐the‐art cross‐domain FER methods. In this paper, we proposed a prototype‐oriented similarity transfer framework (POST) for cross‐domain facial expression recognition. The bidirectional cross‐attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains. The self‐training resampling (STR) strategy is further introduced to enhance similarity transfer.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2260