Revisiting Modality-Specific Feature Compensation for Visible-Infrared Person Re-Identification

Although modality-specific feature compensation becomes a prevailing paradigm for Visible-Infrared Person Re-Identification (VI-ReID) to learn features, it, performance-wise, is not promising, especially when compared to modality-shared feature learning. In this paper, by revisiting the modality-spe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 10; pp. 7226 - 7240
Main Authors	Liu, Jianan, Wang, Jialiang, Huang, Nianchang, Zhang, Qiang, Han, Jungong
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Compensation Computer vision constrains Feature extraction feature fusion Generative adversarial networks Generators Gray-scale high-quality generated images Image color analysis Image processing Image processors Image quality Learning modality-specific feature compensation Modules Representation learning Semantics Visible-infrared person re-identification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Although modality-specific feature compensation becomes a prevailing paradigm for Visible-Infrared Person Re-Identification (VI-ReID) to learn features, it, performance-wise, is not promising, especially when compared to modality-shared feature learning. In this paper, by revisiting the modality-specific feature compensation based models, we reveal that the reasons for being under-performed are: (1) generated images of one modality from another modality may be poor in quality; (2) such existing models usually achieve the modality-specific feature compensation just via simple pixel-level fusion strategies; (3) generated images cannot fully replace corresponding missing ones, which brings in extra modality discrepancy. To address these issues, we propose a new Two-Stage Modality Enhancement Network (TSME) for VI-ReID. Concretely, it first considers the modality discrepancy for cross-modality style translation and optimizes the structures of image generators by involving a new Deeper Skip-connection Generative Adversarial Networks (DSGAN) to generate high-quality images. Then, it presents an attention mechanism based feature-level fusion module, i.e., Pair-wise Image Fusion (PwIF) module, and an auxiliary learning module, i.e., Invoking All-Images (IAI) module, to better exploit the generated and original images for reducing modality discrepancy from the perspectives of feature fusion and feature constraints, respectively. Comprehensive experiments are carried out to demonstrate the success of TSME in tackling the modality discrepancy issue exposed in VI-ReID.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2022.3168999