Landmarks-Driven Triplet Representation for Facial Expression Similarity

TP183; The facial landmarks can provide valuable information for expression-related tasks.However,most approaches only use landmarks for segmentation preprocessing or directly input them into the neural network for fully connection.Such simple combination not only fails to pass the spatial informati...

Full description

Saved in:

Bibliographic Details
Published in	东华大学学报（英文版） Vol. 40; no. 1; pp. 34 - 44
Main Authors	ZHOU Yirun, FENG Xiangyang, ZHU Ming
Format	Journal Article
Language	English
Published	School of Computer Science and Technology,Donghua University,Shanghai 201620,China 2023
Subjects	triplet network facial landmark feature optimization attention mechanism facial expression similarity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	TP183; The facial landmarks can provide valuable information for expression-related tasks.However,most approaches only use landmarks for segmentation preprocessing or directly input them into the neural network for fully connection.Such simple combination not only fails to pass the spatial information to network,but also increases calculation amounts.The method proposed in this paper aims to integrate facial landmarks-driven representation into the triplet network.The spatial information provided by landmarks is introduced into the feature extraction process,so that the model can better capture the location relationship.In addition,coordinate information is also integrated into the triple loss calculation to further enhance similarity prediction.Specifically,for each image,the coordinates of 68 landmarks are detected,and then a region attention map based on these landmarks is generated.For the feature map output by the shallow convolutional layer,it will be multiplied with the attention map to correct the feature activation,so as to strengthen the key region and weaken the unimportant region.Finally,the optimized embedding output can be further used for downstream tasks.Three embeddings of three images output by the network can be regarded as a triplet representation for similarity computation.Through the CK+dataset,the effectiveness of such an optimized feature extraction is verified.After that,it is applied to facial expression similarity tasks.The results on the facial expression comparison(FEC)dataset show that the accuracy rate will be significantly improved after the landmark information is introduced.
ISSN:	1672-5220
DOI:	10.19884/j.1672-5220.202110005