Dual-branch adaptive attention transformer for occluded person re-identification

•An end-to-end dual-branch vision transformer for occluded person re-identification is proposed.•Adaptive extraction of human local features using self-Attention mechanism is achieved.•Goal Consistency Loss with more consistent convergence goals is designed.•The State-of-the-Art performance were ach...

Full description

Saved in:

Bibliographic Details
Published in	Image and vision computing Vol. 131; p. 104633
Main Authors	Lu, Yunhua, Jiang, Mingzi, Liu, Zhi, Mu, Xinyu
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2023
Subjects	Metric learning Multi-headed self-attention Person re-identification Transformer Transformer Person re-identification Metric learning Multi-headed self-attention
Online Access	Get full text
ISSN	0262-8856 1872-8138
DOI	10.1016/j.imavis.2023.104633

Cover

Loading…

More Information
Summary:	•An end-to-end dual-branch vision transformer for occluded person re-identification is proposed.•Adaptive extraction of human local features using self-Attention mechanism is achieved.•Goal Consistency Loss with more consistent convergence goals is designed.•The State-of-the-Art performance were achieved on Occluded-REID dataset. Occluded person re-identification is still a common and challenging task because people are often occluded by some obstacles (e.g. cars and trees) in the real world. In order to locate the unoccluded parts and extract local fine-grained features of the occluded human body, State-of-the-Art (SOTA) methods usually use a pose estimation model, which usually causes additional bias and this two-stage architecture also complicates the model. To solve this problem, an end-to-end dual-branch Transformer network for occluded person re-identification is designed. Specifically, one of the branches is the transformer-based global branch, which is responsible for extracting global features, while in the other local branch, we design the Selective Token Attention (STA) module. STA can utilize the multi-headed self-attention mechanism to select discriminating tokens for effectively extracting the local features. Further, in order to alleviate the inconsistency between Softmax Loss and Triplet Loss convergence goals, Circle Loss is introduced to design the Goal Consistency Loss (GC Loss) to supervise the network. Experiments on four challenging datasets for Re-ID tasks (including occluded person Re-ID and holistic person Re-ID) illustrate that our method can achieve SOTA performance.
ISSN:	0262-8856 1872-8138
DOI:	10.1016/j.imavis.2023.104633