Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification

Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extr...

Full description

Saved in:

Bibliographic Details
Published in	2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5
Main Authors	Shao, Chenglong, Si, Tongzhen, Zhou, Jiehan, Yang, Xiaohui
Format	Conference Proceeding
Language	English
Published	IEEE 06.06.2025
Subjects	Adaptation models Computational modeling Identification of persons large model Pedestrians Reliability engineering Representation learning Supervised learning Text to image Text-to-image person re-identification Unsupervised learning
Online Access	Get full text
DOI	10.1109/DLCV65218.2025.11088688

Cover

Abstract	Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods.
AbstractList	Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods.
Author	Zhou, Jiehan Yang, Xiaohui Shao, Chenglong Si, Tongzhen
Author_xml	– sequence: 1 givenname: Chenglong surname: Shao fullname: Shao, Chenglong email: ise_clshao@stu.ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China – sequence: 2 givenname: Tongzhen surname: Si fullname: Si, Tongzhen email: ise_sitz@ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China – sequence: 3 givenname: Jiehan surname: Zhou fullname: Zhou, Jiehan email: jiehan.zhou@sdust.edu.cn organization: College of Computer Science and Engineering, Shandong University of Science and Technology,Qingdao,China – sequence: 4 givenname: Xiaohui surname: Yang fullname: Yang, Xiaohui email: ise_xhyang@ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China
BookMark	eNo9kF1LwzAYhSPohc79A8H8gc4madPkUqezhYqi09vxtnlbAlsy0lQU_PF2-HF14OE55-KckWPnHRJyydIFY6m-uq2XbzLnTC14yvMDU0oqdUTmutBKCJZzLrU6JV-l3_keHfpxoOAMLTFi-CcrhDgGpDVCcNb19AYGNNQ7WkPokT54g9uBdj7QVzeMewzv9iCs8SMm0SfVDibrCcMwVZ4xqQy6aDvbQrTenZOTDrYDzn9zRl5Wd-tlmdSP99Xyuk6sFjEpjFHSFDLjrWw63eSMGcZAAIpmwjlXmWwZmhZUkTdaQWMy3XSgM1VonokZufhZtYi42Qe7g_C5-btEfAOG2V4U
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/DLCV65218.2025.11088688
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798331522698
EndPage	5
ExternalDocumentID	11088688
Genre	orig-research
GrantInformation_xml	– fundername: University of Jinan grantid: 1009569 funderid: 10.13039/501100004023 – fundername: Shandong Provincial Natural Science Foundation grantid: ZR2024QF185 funderid: 10.13039/501100007129
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243
IEDL.DBID	RIE
IngestDate	Wed Aug 20 06:20:57 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243
PageCount	5
ParticipantIDs	ieee_primary_11088688
PublicationCentury	2000
PublicationDate	2025-June-6
PublicationDateYYYYMMDD	2025-06-06
PublicationDate_xml	– month: 06 year: 2025 text: 2025-June-6 day: 06
PublicationDecade	2020
PublicationTitle	2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV)
PublicationTitleAbbrev	DLCV
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.9125158
Snippet	Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Adaptation models Computational modeling Identification of persons large model Pedestrians Reliability engineering Representation learning Supervised learning Text to image Text-to-image person re-identification Unsupervised learning
Title	Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification
URI	https://ieeexplore.ieee.org/document/11088688
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7TrW2aH1eno8ocQzfZbSRNKqJrx9ZexD_evLabKAjewiPQkoR873v53nsIXQUqjHyeKiKp4YQKFhFhwx5JKAtsSHliDMQhH0YsntL7WTRrktWrXBhrbSU-sx4Mq7d8kyclhMq6IFkXTIgWarlzVidrNZotvye7N8P-M3NwBIqtIPI2s3_0TalgY7CHRpsP1mqRN68stJd8_KrF-O8_2ked7ww9PN5izwHasdkh-ozzRe4OhHVsHqvM4Bi0LlsLeHvlyuKmpOoLvnYIZnCe4SHIwTG0RXtfY-fF4mm2Lpdwi8CECZDjIid3C3f34HHlouNHS-oc37QJ-nXQ0-B20o9J012BvMqwINwYwQx39CNhOpXaOV7G91WobKidOXKwxRLfmkQ5RqOlUNpQqVMlHcNxnC08Qu0sz-wxwoKFmquAcT9KKbeBMhL8QBVBaXcq1AnqwMLNl3X5jPlmzU7_sJ-hXdi_So_FzlG7WJX2wiF_oS-rHf8CvBewqA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwGA06D3pSceJvc_CabW1-9up0VO3G0E12G0mTiujasbUX8Y83abuJguCtfARakpD33tf3fQHgypeYejyRKCCaIyIYRcLgDooJ8w0mPNba5SH7AxaOyf2ETupi9bIWxhhTms9Myz2W__J1FhcuVdZ2lnXBhNgEWxb4Ca3KtWrXltcJ2jdR95lZQHKeLZ-2VuN_3JxSAkdvFwxWr6z8Im-tIlet-ONXN8Z_f9MeaH7X6MHhGn32wYZJD8BnmM0yuyWM1fNQphqGzu2yjji-VywMrJuqvsBri2EaZimMnCEcuovR3pfQ8lg4TpfF3J0jbsDIyeM8Q3cze_rAYUnS4aNBVZVvUqf9muCpdzvqhqi-XwG9BjhHXGvBNLcCJGYqCZSlXtrzJJYGKxumFrhY7BkdS6tpVCCk0iRQiQysxrGqDR-CRpql5ghAwbDi0mfcownhxpc6cExQUtfcnQh5DJpu4qbzqoHGdDVnJ3_EL8F2OOpH0-hu8HAKdtxalu4sdgYa-aIw55YH5OqiXP0vJjiz9Q
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+2nd+International+Conference+on+Deep+Learning+and+Computer+Vision+%28DLCV%29&rft.atitle=Homogeneous+and+Heterogeneous+Feature+Learning+Based+on+Large+Models+for+Unsupervised+Text-to-Image+Person+Re-Identification&rft.au=Shao%2C+Chenglong&rft.au=Si%2C+Tongzhen&rft.au=Zhou%2C+Jiehan&rft.au=Yang%2C+Xiaohui&rft.date=2025-06-06&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FDLCV65218.2025.11088688&rft.externalDocID=11088688