A camera style-invariant learning and channel interaction enhancement fusion network for visible-infrared person re-identification

Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy ca...

Full description

Saved in:
Bibliographic Details
Published inMachine vision and applications Vol. 34; no. 6; p. 117
Main Authors Du, Haishun, Hao, Xinxin, Ye, Yanfang, He, Linbing, Guo, Jiangtao
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy caused by different camera styles. In addition, some feature fusion-based VI-ReID methods try to improve the discriminative capability of pedestrian representations by fusing pedestrian features from different convolutional layers or branches. However, most of them only implement feature fusion by simple operations, such as summation or concatenation, and ignore the interaction between different feature maps. To this end, we propose a camera style-invariant learning and channel interaction enhancement fusion network for VI-ReID. In particular, we design a channel interaction enhancement fusion module. It first computes and utilizes the channel-level similarity matrix of two feature maps to obtain two corresponding weighted feature maps that enhance the common concern information of the original two feature maps. Then, it obtains more discriminative pedestrian features by fusing the two weighted feature maps and mining their complementary information. Furthermore, in order to weaken the impact of camera style discrepancy of pedestrian images, we design a camera style-invariant feature-level adversarial learning strategy to ensure that the feature extraction network can extract camera style-invariant pedestrian features by the adversarial learning between the feature extraction network and the camera style classifier. Extensive experimental results on the two benchmark datasets, SYSU-MM01 and RegDB, demonstrate that the performance of CC-Net achieves the recent advanced level.
ISSN:0932-8092
1432-1769
DOI:10.1007/s00138-023-01473-4