Self-Supervised Temporal Sensitive Hashing for Video Retrieval

Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 26; pp. 9021 - 9035
Main Authors	Li, Qihua, Tian, Xing, Ng, Wing W. Y.
Format	Journal Article
Language	English
Published	IEEE 2024
Subjects	Hash functions Long short term memory Perturbation methods Robustness Self-supervise Sensitivity Training transformer Transformers video hashing video retrieval
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2024.3385183

Cover

Loading…

Abstract	Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
AbstractList	Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
Author	Ng, Wing W. Y. Li, Qihua Tian, Xing
Author_xml	– sequence: 1 givenname: Qihua orcidid: 0009-0006-0937-9012 surname: Li fullname: Li, Qihua email: liqihua1999@163.com organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China – sequence: 2 givenname: Xing orcidid: 0000-0002-7546-1018 surname: Tian fullname: Tian, Xing email: shawntian123@gmail.com organization: School of Artificial Intelligence, South China Normal University, Guangzhou, Guangdong, China – sequence: 3 givenname: Wing W. Y. orcidid: 0000-0003-0783-3585 surname: Ng fullname: Ng, Wing W. Y. email: wingng@ieee.org organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
BookMark	eNpNj0FLwzAYhoNMcJvePXjoH2j9kjRNchFk6CZsCLZ6LWn7RSNdW5I68N_bsR08ve_hfV54FmTW9R0SckshoRT0fbHbJQxYmnCuBFX8gsypTmkMIOVs6oJBrBmFK7II4RuApgLknDzk2No4_xnQH1zAJipwP_TetFGOXXCjO2C0MeHLdZ-R7X304RrsozccvcODaa_JpTVtwJtzLsn781Ox2sTb1_XL6nEb14zKMRYgOIKqLFjGUYICU4kKGiWyimWV0o01pjYNB0VNk0lkmguhqZAGWKVrviRw-q19H4JHWw7e7Y3_LSmUR_9y8i-P_uXZf0LuTohDxH_zVDMBnP8BVGtYVw
CODEN	ITMUF8
Cites_doi	10.1109/TMM.2016.2645404 10.1109/TMM.2018.2890362 10.1109/TCSVT.2020.2974768 10.1109/CVPR46437.2021.01334 10.1109/TPAMI.2012.48 10.1109/TMM.2019.2946096 10.48550/arXiv.1810.04805 10.24963/ijcai.2017/437 10.1109/TMM.2016.2515990 10.1109/TMM.2020.2978593 10.1109/TIP.2023.3278474 10.1007/978-3-031-19781-9_11 10.1609/aaai.v37i3.25373 10.1109/TIP.2019.2940693 10.1109/TPDS.2020.2975550 10.1109/TKDE.2016.2562624 10.1109/TPAMI.2012.193 10.1109/TKDE.2012.76 10.1109/TIP.2018.2889269 10.1109/TIP.2018.2814344 10.1016/j.neucom.2022.06.067 10.1007/s11263-019-01166-4 10.1145/2072298.2072354 10.1109/ICCV.2013.282 10.1109/TCSVT.2021.3093258 10.1145/2964284.2964308 10.1109/TIP.2019.2940683 10.1145/2812802 10.1109/TNNLS.2020.2997020 10.1109/TCDS.2019.2963339 10.1016/j.neunet.2005.06.042 10.1109/TIP.2018.2882155 10.1109/TCYB.2019.2923756 10.1109/TMM.2021.3070127 10.1109/TCSVT.2020.3001583 10.1145/3532624 10.1109/tcyb.2023.3269756 10.1109/TIP.2019.2948472 10.1109/TMM.2020.2994509 10.1109/TMM.2016.2610324 10.1109/TMM.2016.2557059 10.48550/ARXIV.1706.03762 10.1109/TPAMI.2017.2670560 10.1109/CVPR.2015.7298598
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TMM.2024.3385183
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1941-0077
EndPage	9035
ExternalDocumentID	10_1109_TMM_2024_3385183 10492503
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62202175 funderid: 10.13039/501100001809 – fundername: Basic and Applied Basic Research Foundation of Guangdong Province; Guangdong Basic and Applied Basic Research Foundation grantid: 2024A1515011896 funderid: 10.13039/501100021171
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION
ID	FETCH-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3
IEDL.DBID	RIE
ISSN	1520-9210
IngestDate	Tue Jul 01 01:54:43 EDT 2025 Wed Aug 27 02:28:18 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3
ORCID	0000-0002-7546-1018 0000-0003-0783-3585 0009-0006-0937-9012
PageCount	15
ParticipantIDs	crossref_primary_10_1109_TMM_2024_3385183 ieee_primary_10492503
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20240000 2024-00-00
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 20240000
PublicationDecade	2020
PublicationTitle	IEEE transactions on multimedia
PublicationTitleAbbrev	TMM
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref12 ref15 ref14 ref10 ref17 ref16 ref19 ref18 Simonyan (ref34) 2015 ref51 ref50 ref46 ref45 ref47 ref42 ref41 ref44 ref43 Gionis (ref28) 1999 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref37 ref36 ref31 ref30 ref2 ref1 ref39 ref38 Over (ref33) 2014 ref24 ref23 ref26 ref25 ref20 Wang (ref48) 2023; 206 Weiss (ref29) 2008; 21 ref22 ref21 Algur (ref32) 2016 ref27 Liu (ref11) 2011
References_xml	– ident: ref35 doi: 10.1109/TMM.2016.2645404 – ident: ref36 doi: 10.1109/TMM.2018.2890362 – ident: ref19 doi: 10.1109/TCSVT.2020.2974768 – ident: ref17 doi: 10.1109/CVPR46437.2021.01334 – ident: ref13 doi: 10.1109/TPAMI.2012.48 – ident: ref10 doi: 10.1109/TMM.2019.2946096 – ident: ref39 doi: 10.48550/arXiv.1810.04805 – ident: ref50 doi: 10.24963/ijcai.2017/437 – ident: ref4 doi: 10.1109/TMM.2016.2515990 – ident: ref37 doi: 10.1109/TMM.2020.2978593 – ident: ref3 doi: 10.1109/TIP.2023.3278474 – ident: ref47 doi: 10.1007/978-3-031-19781-9_11 – ident: ref43 doi: 10.1609/aaai.v37i3.25373 – ident: ref20 doi: 10.1109/TIP.2019.2940693 – ident: ref46 doi: 10.1109/TPDS.2020.2975550 – ident: ref49 doi: 10.1109/TKDE.2016.2562624 – volume: 21 start-page: 1753 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2008 ident: ref29 article-title: Spectral hashing – ident: ref7 doi: 10.1109/TPAMI.2012.193 – ident: ref9 doi: 10.1109/TKDE.2012.76 – ident: ref23 doi: 10.1109/TIP.2018.2889269 – ident: ref5 doi: 10.1109/TIP.2018.2814344 – ident: ref41 doi: 10.1016/j.neucom.2022.06.067 – ident: ref42 doi: 10.1007/s11263-019-01166-4 – start-page: 1 volume-title: Proc. 28th Int. Conf. Int. Conf. Mach. Learn. year: 2011 ident: ref11 article-title: Hashing with graphs – ident: ref15 doi: 10.1145/2072298.2072354 – year: 2015 ident: ref34 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref18 doi: 10.1109/ICCV.2013.282 – year: 2016 ident: ref32 article-title: Video key frame extraction using entropy value as global and local feature – ident: ref12 doi: 10.1109/TCSVT.2021.3093258 – ident: ref16 doi: 10.1145/2964284.2964308 – ident: ref2 doi: 10.1109/TIP.2019.2940683 – ident: ref45 doi: 10.1145/2812802 – ident: ref21 doi: 10.1109/TNNLS.2020.2997020 – ident: ref8 doi: 10.1109/TCDS.2019.2963339 – ident: ref24 doi: 10.1016/j.neunet.2005.06.042 – ident: ref6 doi: 10.1109/TIP.2018.2882155 – ident: ref26 doi: 10.1109/TCYB.2019.2923756 – start-page: 518 volume-title: Proc. 25th Int. Conf. Very Large Data Bases year: 1999 ident: ref28 article-title: Similarity search in high dimensions via hashing – ident: ref25 doi: 10.1109/TMM.2021.3070127 – volume: 206 start-page: 6722 volume-title: Proc. Int. Conf. Artif. Intell. Statist. year: 2023 ident: ref48 article-title: Uncertainty-aware unsupervised video hashing – ident: ref22 doi: 10.1109/TCSVT.2020.3001583 – ident: ref31 doi: 10.1145/3532624 – ident: ref27 doi: 10.1109/tcyb.2023.3269756 – ident: ref40 doi: 10.1109/TIP.2019.2948472 – ident: ref51 doi: 10.1109/TMM.2020.2994509 – ident: ref14 doi: 10.1109/TMM.2016.2610324 – ident: ref1 doi: 10.1109/TMM.2016.2557059 – start-page: 1 volume-title: Proc. Trecvid Workshop Participants Notebook Papers year: 2014 ident: ref33 article-title: TRECVID - an overview of the goals, tasks, data, evaluation mechanisms and metrics – ident: ref38 doi: 10.48550/ARXIV.1706.03762 – ident: ref44 doi: 10.1109/TPAMI.2017.2670560 – ident: ref30 doi: 10.1109/CVPR.2015.7298598
SSID	ssj0014507
Score	2.4069889
Snippet	Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos....
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	9021
SubjectTerms	Hash functions Long short term memory Perturbation methods Robustness Self-supervise Sensitivity Training transformer Transformers video hashing video retrieval
Title	Self-Supervised Temporal Sensitive Hashing for Video Retrieval
URI	https://ieeexplore.ieee.org/document/10492503
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akx6s1or1RQ5ePGRNN7vb5CKIWIrQHmwrvS15LYilLbp78dc7SXalCoK3ZUkgzCSZbzIz3yB0bbhmcVpIwg0ct0TyggihBBEarJ-UzAjl6p3Hk2w0T54W6aIuVve1MNZan3xmI_fpY_lmrSv3VAYn3FHpOW7PXfDcQrHWd8ggSX1tNNgjSgQ4Mk1Mkorb2XgMnmCcROCPpX3OftigraYq3qYM22jSrCakkrxFVaki_fmLqPHfyz1EBzW6xPdhOxyhHbvqoHbTuQHXB7mD9rdoCI_R3dQuCzKtNu7e-LAGzwJf1RJPXXq7uxDxKDRdwoBx8cursWv87HtxwUbtovnwcfYwInVfBaLBASkJgB5mKVcFLWJmB4AZpUoVNTzNVJwpLkwhpZaGAVyQJhvY2JGwi346kDRWQrMT1FqtV_YUYaZNIgAjZQaAV6GlTDiMYVqyfkyN7vfQTSPpfBPoM3LvdlCRg1Zyp5W81koPdZ0Mt8YF8Z398f8c7bnp4T3kArXK98peAkIo1ZXfGV_Oz7cu
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA8yD-rB6Zw4P3Pw4iG1bdouuQgijqrrDq6T3Uq-CuLYhrYX_3pf2k6mIHgrJYTwXpL3e3nv_R5Cl5op6oe5IEzDcQsEywnnkhOuwPoJQTWXtt45GUXxJHichtOmWL2qhTHGVMlnxrGfVSxfL1Rpn8rghFsqPcvtuQmGP_Tqcq3voEEQVtXRYJFcwsGVWUUlXX6dJgn4gn7ggEcWeoz-sEJrbVUqqzJoo9FqPXUyyZtTFtJRn7-oGv-94D202-BLfFtviH20YeYd1F71bsDNUe6gnTUiwgN0MzaznIzLpb05PozGac1YNcNjm-Bur0Qc122XMKBc_PKqzQI_V924YKt20WRwn97FpOmsQBS4IAUB2EONy2Tu5j41fUCNQobS1SyMpB9JxnUuhBKaAmAQOuob39Kwcy_sC9eXXNFD1Jov5uYIYap0wAElRRqgV66ECBiMoUpQz3e18nroaiXpbFkTaGSV4-HyDLSSWa1kjVZ6qGtluDauFt_xH_8v0FacJsNs-DB6OkHbdqr6deQUtYr30pwBXijkebVLvgCP8bp3
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-Supervised+Temporal+Sensitive+Hashing+for+Video+Retrieval&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Qihua&rft.au=Tian%2C+Xing&rft.au=Ng%2C+Wing+W.+Y.&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=9021&rft.epage=9035&rft_id=info:doi/10.1109%2FTMM.2024.3385183&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3385183
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon