A camera style-invariant learning and channel interaction enhancement fusion network for visible-infrared person re-identification
Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy ca...
Saved in:
Published in | Machine vision and applications Vol. 34; no. 6; p. 117 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.11.2023
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy caused by different camera styles. In addition, some feature fusion-based VI-ReID methods try to improve the discriminative capability of pedestrian representations by fusing pedestrian features from different convolutional layers or branches. However, most of them only implement feature fusion by simple operations, such as summation or concatenation, and ignore the interaction between different feature maps. To this end, we propose a camera style-invariant learning and channel interaction enhancement fusion network for VI-ReID. In particular, we design a channel interaction enhancement fusion module. It first computes and utilizes the channel-level similarity matrix of two feature maps to obtain two corresponding weighted feature maps that enhance the common concern information of the original two feature maps. Then, it obtains more discriminative pedestrian features by fusing the two weighted feature maps and mining their complementary information. Furthermore, in order to weaken the impact of camera style discrepancy of pedestrian images, we design a camera style-invariant feature-level adversarial learning strategy to ensure that the feature extraction network can extract camera style-invariant pedestrian features by the adversarial learning between the feature extraction network and the camera style classifier. Extensive experimental results on the two benchmark datasets, SYSU-MM01 and RegDB, demonstrate that the performance of CC-Net achieves the recent advanced level. |
---|---|
ISSN: | 0932-8092 1432-1769 |
DOI: | 10.1007/s00138-023-01473-4 |