Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification

Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extr...

Full description

Saved in:
Bibliographic Details
Published in2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5
Main Authors Shao, Chenglong, Si, Tongzhen, Zhou, Jiehan, Yang, Xiaohui
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.06.2025
Subjects
Online AccessGet full text
DOI10.1109/DLCV65218.2025.11088688

Cover

Abstract Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods.
AbstractList Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods.
Author Zhou, Jiehan
Yang, Xiaohui
Shao, Chenglong
Si, Tongzhen
Author_xml – sequence: 1
  givenname: Chenglong
  surname: Shao
  fullname: Shao, Chenglong
  email: ise_clshao@stu.ujn.edu.cn
  organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China
– sequence: 2
  givenname: Tongzhen
  surname: Si
  fullname: Si, Tongzhen
  email: ise_sitz@ujn.edu.cn
  organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China
– sequence: 3
  givenname: Jiehan
  surname: Zhou
  fullname: Zhou, Jiehan
  email: jiehan.zhou@sdust.edu.cn
  organization: College of Computer Science and Engineering, Shandong University of Science and Technology,Qingdao,China
– sequence: 4
  givenname: Xiaohui
  surname: Yang
  fullname: Yang, Xiaohui
  email: ise_xhyang@ujn.edu.cn
  organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China
BookMark eNo9kF1LwzAYhSPohc79A8H8gc4madPkUqezhYqi09vxtnlbAlsy0lQU_PF2-HF14OE55-KckWPnHRJyydIFY6m-uq2XbzLnTC14yvMDU0oqdUTmutBKCJZzLrU6JV-l3_keHfpxoOAMLTFi-CcrhDgGpDVCcNb19AYGNNQ7WkPokT54g9uBdj7QVzeMewzv9iCs8SMm0SfVDibrCcMwVZ4xqQy6aDvbQrTenZOTDrYDzn9zRl5Wd-tlmdSP99Xyuk6sFjEpjFHSFDLjrWw63eSMGcZAAIpmwjlXmWwZmhZUkTdaQWMy3XSgM1VonokZufhZtYi42Qe7g_C5-btEfAOG2V4U
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/DLCV65218.2025.11088688
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331522698
EndPage 5
ExternalDocumentID 11088688
Genre orig-research
GrantInformation_xml – fundername: University of Jinan
  grantid: 1009569
  funderid: 10.13039/501100004023
– fundername: Shandong Provincial Natural Science Foundation
  grantid: ZR2024QF185
  funderid: 10.13039/501100007129
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243
IEDL.DBID RIE
IngestDate Wed Aug 20 06:20:57 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243
PageCount 5
ParticipantIDs ieee_primary_11088688
PublicationCentury 2000
PublicationDate 2025-June-6
PublicationDateYYYYMMDD 2025-06-06
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-6
  day: 06
PublicationDecade 2020
PublicationTitle 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV)
PublicationTitleAbbrev DLCV
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9125158
Snippet Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Adaptation models
Computational modeling
Identification of persons
large model
Pedestrians
Reliability engineering
Representation learning
Supervised learning
Text to image
Text-to-image person re-identification
Unsupervised learning
Title Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification
URI https://ieeexplore.ieee.org/document/11088688
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7TrW2aH1eno8ocQzfZbSRNKqJrx9ZexD_evLabKAjewiPQkoR873v53nsIXQUqjHyeKiKp4YQKFhFhwx5JKAtsSHliDMQhH0YsntL7WTRrktWrXBhrbSU-sx4Mq7d8kyclhMq6IFkXTIgWarlzVidrNZotvye7N8P-M3NwBIqtIPI2s3_0TalgY7CHRpsP1mqRN68stJd8_KrF-O8_2ked7ww9PN5izwHasdkh-ozzRe4OhHVsHqvM4Bi0LlsLeHvlyuKmpOoLvnYIZnCe4SHIwTG0RXtfY-fF4mm2Lpdwi8CECZDjIid3C3f34HHlouNHS-oc37QJ-nXQ0-B20o9J012BvMqwINwYwQx39CNhOpXaOV7G91WobKidOXKwxRLfmkQ5RqOlUNpQqVMlHcNxnC08Qu0sz-wxwoKFmquAcT9KKbeBMhL8QBVBaXcq1AnqwMLNl3X5jPlmzU7_sJ-hXdi_So_FzlG7WJX2wiF_oS-rHf8CvBewqA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwGA06D3pSceJvc_CabW1-9up0VO3G0E12G0mTiujasbUX8Y83abuJguCtfARakpD33tf3fQHgypeYejyRKCCaIyIYRcLgDooJ8w0mPNba5SH7AxaOyf2ETupi9bIWxhhTms9Myz2W__J1FhcuVdZ2lnXBhNgEWxb4Ca3KtWrXltcJ2jdR95lZQHKeLZ-2VuN_3JxSAkdvFwxWr6z8Im-tIlet-ONXN8Z_f9MeaH7X6MHhGn32wYZJD8BnmM0yuyWM1fNQphqGzu2yjji-VywMrJuqvsBri2EaZimMnCEcuovR3pfQ8lg4TpfF3J0jbsDIyeM8Q3cze_rAYUnS4aNBVZVvUqf9muCpdzvqhqi-XwG9BjhHXGvBNLcCJGYqCZSlXtrzJJYGKxumFrhY7BkdS6tpVCCk0iRQiQysxrGqDR-CRpql5ghAwbDi0mfcownhxpc6cExQUtfcnQh5DJpu4qbzqoHGdDVnJ3_EL8F2OOpH0-hu8HAKdtxalu4sdgYa-aIw55YH5OqiXP0vJjiz9Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+2nd+International+Conference+on+Deep+Learning+and+Computer+Vision+%28DLCV%29&rft.atitle=Homogeneous+and+Heterogeneous+Feature+Learning+Based+on+Large+Models+for+Unsupervised+Text-to-Image+Person+Re-Identification&rft.au=Shao%2C+Chenglong&rft.au=Si%2C+Tongzhen&rft.au=Zhou%2C+Jiehan&rft.au=Yang%2C+Xiaohui&rft.date=2025-06-06&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FDLCV65218.2025.11088688&rft.externalDocID=11088688