Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification
Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extr...
Saved in:
Published in | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.06.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DLCV65218.2025.11088688 |
Cover
Abstract | Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods. |
---|---|
AbstractList | Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous annotated data, existing supervised learning methods have achieved promising performance. However, manually annotating large-scale databases is extremely time-consuming and impractical, which restricts their application in practical scenarios. Several methods fine-tune one MLLM to construct cross-modality databases and employ contrastive loss to constrain sample features. However, they neglect the reliability of text generation and feature optimization processes. To this end, we propose Homogeneous and Heterogeneous Feature Learning based on Large Models (HHLLM) for unsupervised TIReID task. Firstly, we design a text generation process with joint large models that leverage the diversity strength of MLLMs to generate and filter reliable texts for constructing image-text matching relationships. Secondly, we introduce an adapter-based learning strategy to transfer image-text prior knowledge and enhance the feature representation capability. Furthermore, we construct a Homogeneous and Heterogeneous Feature Learning (HHFL) process, which continuously optimizes the intra-modality and inter-modality features from class and instance views. We perform extensive experiments on benchmark TIReID databases to evaluate HHFLLM. The experimental results demonstrate that our method achieves state-of-the-art performance compared to unsupervised methods. |
Author | Zhou, Jiehan Yang, Xiaohui Shao, Chenglong Si, Tongzhen |
Author_xml | – sequence: 1 givenname: Chenglong surname: Shao fullname: Shao, Chenglong email: ise_clshao@stu.ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China – sequence: 2 givenname: Tongzhen surname: Si fullname: Si, Tongzhen email: ise_sitz@ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China – sequence: 3 givenname: Jiehan surname: Zhou fullname: Zhou, Jiehan email: jiehan.zhou@sdust.edu.cn organization: College of Computer Science and Engineering, Shandong University of Science and Technology,Qingdao,China – sequence: 4 givenname: Xiaohui surname: Yang fullname: Yang, Xiaohui email: ise_xhyang@ujn.edu.cn organization: University of Jinan,Shandong Key Laboratory of Ubiquitous Intelligent Computing,Jinan,China |
BookMark | eNo9kF1LwzAYhSPohc79A8H8gc4madPkUqezhYqi09vxtnlbAlsy0lQU_PF2-HF14OE55-KckWPnHRJyydIFY6m-uq2XbzLnTC14yvMDU0oqdUTmutBKCJZzLrU6JV-l3_keHfpxoOAMLTFi-CcrhDgGpDVCcNb19AYGNNQ7WkPokT54g9uBdj7QVzeMewzv9iCs8SMm0SfVDibrCcMwVZ4xqQy6aDvbQrTenZOTDrYDzn9zRl5Wd-tlmdSP99Xyuk6sFjEpjFHSFDLjrWw63eSMGcZAAIpmwjlXmWwZmhZUkTdaQWMy3XSgM1VonokZufhZtYi42Qe7g_C5-btEfAOG2V4U |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/DLCV65218.2025.11088688 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798331522698 |
EndPage | 5 |
ExternalDocumentID | 11088688 |
Genre | orig-research |
GrantInformation_xml | – fundername: University of Jinan grantid: 1009569 funderid: 10.13039/501100004023 – fundername: Shandong Provincial Natural Science Foundation grantid: ZR2024QF185 funderid: 10.13039/501100007129 |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 20 06:20:57 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i93t-7dd86d7642c6bf9b511d11a3ae3bd7652846c1edca875b98abd49bfa94879243 |
PageCount | 5 |
ParticipantIDs | ieee_primary_11088688 |
PublicationCentury | 2000 |
PublicationDate | 2025-June-6 |
PublicationDateYYYYMMDD | 2025-06-06 |
PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-6 day: 06 |
PublicationDecade | 2020 |
PublicationTitle | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) |
PublicationTitleAbbrev | DLCV |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.9125158 |
Snippet | Text-to-image person re-identification (TIReID) aims to identify and retrieve target pedestrians according to given textual queries. Driven by enormous... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Adaptation models Computational modeling Identification of persons large model Pedestrians Reliability engineering Representation learning Supervised learning Text to image Text-to-image person re-identification Unsupervised learning |
Title | Homogeneous and Heterogeneous Feature Learning Based on Large Models for Unsupervised Text-to-Image Person Re-Identification |
URI | https://ieeexplore.ieee.org/document/11088688 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl7TrW2aH1eno8ocQzfZbSRNKqJrx9ZexD_evLabKAjewiPQkoR873v53nsIXQUqjHyeKiKp4YQKFhFhwx5JKAtsSHliDMQhH0YsntL7WTRrktWrXBhrbSU-sx4Mq7d8kyclhMq6IFkXTIgWarlzVidrNZotvye7N8P-M3NwBIqtIPI2s3_0TalgY7CHRpsP1mqRN68stJd8_KrF-O8_2ked7ww9PN5izwHasdkh-ozzRe4OhHVsHqvM4Bi0LlsLeHvlyuKmpOoLvnYIZnCe4SHIwTG0RXtfY-fF4mm2Lpdwi8CECZDjIid3C3f34HHlouNHS-oc37QJ-nXQ0-B20o9J012BvMqwINwYwQx39CNhOpXaOV7G91WobKidOXKwxRLfmkQ5RqOlUNpQqVMlHcNxnC08Qu0sz-wxwoKFmquAcT9KKbeBMhL8QBVBaXcq1AnqwMLNl3X5jPlmzU7_sJ-hXdi_So_FzlG7WJX2wiF_oS-rHf8CvBewqA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwGA06D3pSceJvc_CabW1-9up0VO3G0E12G0mTiujasbUX8Y83abuJguCtfARakpD33tf3fQHgypeYejyRKCCaIyIYRcLgDooJ8w0mPNba5SH7AxaOyf2ETupi9bIWxhhTms9Myz2W__J1FhcuVdZ2lnXBhNgEWxb4Ca3KtWrXltcJ2jdR95lZQHKeLZ-2VuN_3JxSAkdvFwxWr6z8Im-tIlet-ONXN8Z_f9MeaH7X6MHhGn32wYZJD8BnmM0yuyWM1fNQphqGzu2yjji-VywMrJuqvsBri2EaZimMnCEcuovR3pfQ8lg4TpfF3J0jbsDIyeM8Q3cze_rAYUnS4aNBVZVvUqf9muCpdzvqhqi-XwG9BjhHXGvBNLcCJGYqCZSlXtrzJJYGKxumFrhY7BkdS6tpVCCk0iRQiQysxrGqDR-CRpql5ghAwbDi0mfcownhxpc6cExQUtfcnQh5DJpu4qbzqoHGdDVnJ3_EL8F2OOpH0-hu8HAKdtxalu4sdgYa-aIw55YH5OqiXP0vJjiz9Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+2nd+International+Conference+on+Deep+Learning+and+Computer+Vision+%28DLCV%29&rft.atitle=Homogeneous+and+Heterogeneous+Feature+Learning+Based+on+Large+Models+for+Unsupervised+Text-to-Image+Person+Re-Identification&rft.au=Shao%2C+Chenglong&rft.au=Si%2C+Tongzhen&rft.au=Zhou%2C+Jiehan&rft.au=Yang%2C+Xiaohui&rft.date=2025-06-06&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FDLCV65218.2025.11088688&rft.externalDocID=11088688 |