Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification
Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extr...
Saved in:
Published in | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.06.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DLCV65218.2025.11088688 |
Cover
Summary: | Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods. |
---|---|
DOI: | 10.1109/DLCV65218.2025.11088688 |