A camera style-invariant learning and channel interaction enhancement fusion network for visible-infrared person re-identification

Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy ca...

Full description

Saved in:

Bibliographic Details
Published in	Machine vision and applications Vol. 34; no. 6; p. 117
Main Authors	Du, Haishun, Hao, Xinxin, Ye, Yanfang, He, Linbing, Guo, Jiangtao
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2023 Springer Nature B.V
Subjects	Cameras Communications Engineering Computer Science Feature extraction Feature maps Image Processing and Computer Vision Infrared imagery Invariants Machine learning Networks Original Paper Pattern Recognition Vision systems Cross-modality visible-infrared person re-identification Feature-level adversarial learning strategy Camera style-invariant learning Channel interaction enhancement fusion
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cross-modality visible-infrared person re-identification (VI-ReID) aims to match visible and infrared pedestrian images from different cameras in various scenarios. However, most existing VI-ReID methods only focus on eliminating the modality discrepancy while ignoring the intra-class discrepancy caused by different camera styles. In addition, some feature fusion-based VI-ReID methods try to improve the discriminative capability of pedestrian representations by fusing pedestrian features from different convolutional layers or branches. However, most of them only implement feature fusion by simple operations, such as summation or concatenation, and ignore the interaction between different feature maps. To this end, we propose a camera style-invariant learning and channel interaction enhancement fusion network for VI-ReID. In particular, we design a channel interaction enhancement fusion module. It first computes and utilizes the channel-level similarity matrix of two feature maps to obtain two corresponding weighted feature maps that enhance the common concern information of the original two feature maps. Then, it obtains more discriminative pedestrian features by fusing the two weighted feature maps and mining their complementary information. Furthermore, in order to weaken the impact of camera style discrepancy of pedestrian images, we design a camera style-invariant feature-level adversarial learning strategy to ensure that the feature extraction network can extract camera style-invariant pedestrian features by the adversarial learning between the feature extraction network and the camera style classifier. Extensive experimental results on the two benchmark datasets, SYSU-MM01 and RegDB, demonstrate that the performance of CC-Net achieves the recent advanced level.
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-023-01473-4