Camera-Based Batch Normalization: An Effective Distribution Alignment Method for Person Re-Identification

Person re-identification (ReID) aims at matching identities across disjoint cameras. Its fundamental difficulty lies in associating images across individual cameras, where a key clue, i.e. , identity appearance, is prone to the environmental factors of cameras and, consequently, subject to distinct...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 374 - 387
Main Authors Zhuang, Zijie, Wei, Longhui, Xie, Lingxi, Ai, Haizhou, Tian, Qi
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Person re-identification (ReID) aims at matching identities across disjoint cameras. Its fundamental difficulty lies in associating images across individual cameras, where a key clue, i.e. , identity appearance, is prone to the environmental factors of cameras and, consequently, subject to distinct image distributions due to the environmental differences between cameras. To associate images from training cameras, ReID methods strongly demand expensive inter-camera annotations for learning the relations between the distribution of these cameras, yet trained models are still not guaranteed to transfer well to unseen cameras. This problem significantly limits the application of ReID. This paper rethinks the working mechanism of conventional ReID approaches and puts forward a new solution. With an effective operator named Camera-based Batch Normalization (CBN), we guarantee an invariant input distribution independent of all cameras. Thus, the training and testing procedures are always conducted under the same input distribution. This alignment brings three benefits. First, ReID models enjoy better abilities to generalize across testing scenarios with unseen cameras and transfer across multiple training sets. Second, it makes better use of intra-camera annotations, which have been undervalued before due to the lack of cross-camera information. Ideally, the cost of inter-camera annotations can be largely reduced. Third, cross-modality tasks can be better defined through aligning visible/infrared cameras' distributions. Experiments on a wide range of ReID tasks demonstrate the effectiveness of our approach.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2021.3058111