Multi-Scale Explicit Matching and Mutual Subject Teacher Learning for Generalizable Person Re-Identification

Domain generalization in person re-identification (DG-ReID) stands out as the most challenging task and practically important branch in the ReID field, which enables the direct deployment of pre-trained models in unseen and real scenarios. Recent works have made significant efforts in this task via...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 34; no. 9; pp. 8881 - 8895
Main Authors	Chen, Kaixiang, Fang, Pengfei, Ye, Zi, Zhang, Liyan
Format	Journal Article
Language	English
Published	IEEE 01.09.2024
Subjects	Data models domain generalization Feature extraction Identification of persons multi-scale mutual-teacher Pedestrians Person re-identification Protocols Task analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Domain generalization in person re-identification (DG-ReID) stands out as the most challenging task and practically important branch in the ReID field, which enables the direct deployment of pre-trained models in unseen and real scenarios. Recent works have made significant efforts in this task via the image-matching paradigm, which searches for the local correspondences in the feature maps. A common practice of employing pixel-wise matching is typically used to ensure efficient matching. This, however, makes the matching susceptible to deviations caused by identity-irrelevant pixel features. On the other hand, patch-wise matching also demonstrates that it will disregard the spatial orientation of pedestrians and amplify the impact of noise. To address the mentioned issues, this paper proposes the Multi-Scale Query-Adaptive Convolution (QAConv-MS) framework, which encodes patches in the feature maps to pixels using template kernels of various scales. This enables the matching process to enjoy broader receptive fields and robustness to orientations and noises. To stabilize the matching process and facilitate the independent learning of each sub-kernel within the template kernels to capture diverse local patterns, we propose the OrthoGonal Norm (OGNorm), which consists of two orthogonal normalizations. We also present Mutual Subject Teacher Learning (MSTL) to address the potential issues of overconfidence and overfitting in the model. MSTL allows two models to individually select the most challenging data for training, resulting in more dependable soft labels that can provide mutual supervision. Extensive experiments conducted in both single-source and multi-source setups offer compelling evidence of our framework's generalization and competitiveness.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3382322