Self-Supervised Temporal Sensitive Hashing for Video Retrieval
Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...
Saved in:
Published in | IEEE transactions on multimedia Vol. 26; pp. 9021 - 9035 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2024
|
Subjects | |
Online Access | Get full text |
ISSN | 1520-9210 1941-0077 |
DOI | 10.1109/TMM.2024.3385183 |
Cover
Loading…
Abstract | Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods. |
---|---|
AbstractList | Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods. |
Author | Ng, Wing W. Y. Li, Qihua Tian, Xing |
Author_xml | – sequence: 1 givenname: Qihua orcidid: 0009-0006-0937-9012 surname: Li fullname: Li, Qihua email: liqihua1999@163.com organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China – sequence: 2 givenname: Xing orcidid: 0000-0002-7546-1018 surname: Tian fullname: Tian, Xing email: shawntian123@gmail.com organization: School of Artificial Intelligence, South China Normal University, Guangzhou, Guangdong, China – sequence: 3 givenname: Wing W. Y. orcidid: 0000-0003-0783-3585 surname: Ng fullname: Ng, Wing W. Y. email: wingng@ieee.org organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China |
BookMark | eNpNj0FLwzAYhoNMcJvePXjoH2j9kjRNchFk6CZsCLZ6LWn7RSNdW5I68N_bsR08ve_hfV54FmTW9R0SckshoRT0fbHbJQxYmnCuBFX8gsypTmkMIOVs6oJBrBmFK7II4RuApgLknDzk2No4_xnQH1zAJipwP_TetFGOXXCjO2C0MeHLdZ-R7X304RrsozccvcODaa_JpTVtwJtzLsn781Ox2sTb1_XL6nEb14zKMRYgOIKqLFjGUYICU4kKGiWyimWV0o01pjYNB0VNk0lkmguhqZAGWKVrviRw-q19H4JHWw7e7Y3_LSmUR_9y8i-P_uXZf0LuTohDxH_zVDMBnP8BVGtYVw |
CODEN | ITMUF8 |
Cites_doi | 10.1109/TMM.2016.2645404 10.1109/TMM.2018.2890362 10.1109/TCSVT.2020.2974768 10.1109/CVPR46437.2021.01334 10.1109/TPAMI.2012.48 10.1109/TMM.2019.2946096 10.48550/arXiv.1810.04805 10.24963/ijcai.2017/437 10.1109/TMM.2016.2515990 10.1109/TMM.2020.2978593 10.1109/TIP.2023.3278474 10.1007/978-3-031-19781-9_11 10.1609/aaai.v37i3.25373 10.1109/TIP.2019.2940693 10.1109/TPDS.2020.2975550 10.1109/TKDE.2016.2562624 10.1109/TPAMI.2012.193 10.1109/TKDE.2012.76 10.1109/TIP.2018.2889269 10.1109/TIP.2018.2814344 10.1016/j.neucom.2022.06.067 10.1007/s11263-019-01166-4 10.1145/2072298.2072354 10.1109/ICCV.2013.282 10.1109/TCSVT.2021.3093258 10.1145/2964284.2964308 10.1109/TIP.2019.2940683 10.1145/2812802 10.1109/TNNLS.2020.2997020 10.1109/TCDS.2019.2963339 10.1016/j.neunet.2005.06.042 10.1109/TIP.2018.2882155 10.1109/TCYB.2019.2923756 10.1109/TMM.2021.3070127 10.1109/TCSVT.2020.3001583 10.1145/3532624 10.1109/tcyb.2023.3269756 10.1109/TIP.2019.2948472 10.1109/TMM.2020.2994509 10.1109/TMM.2016.2610324 10.1109/TMM.2016.2557059 10.48550/ARXIV.1706.03762 10.1109/TPAMI.2017.2670560 10.1109/CVPR.2015.7298598 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TMM.2024.3385183 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1941-0077 |
EndPage | 9035 |
ExternalDocumentID | 10_1109_TMM_2024_3385183 10492503 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62202175 funderid: 10.13039/501100001809 – fundername: Basic and Applied Basic Research Foundation of Guangdong Province; Guangdong Basic and Applied Basic Research Foundation grantid: 2024A1515011896 funderid: 10.13039/501100021171 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION |
ID | FETCH-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3 |
IEDL.DBID | RIE |
ISSN | 1520-9210 |
IngestDate | Tue Jul 01 01:54:43 EDT 2025 Wed Aug 27 02:28:18 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3 |
ORCID | 0000-0002-7546-1018 0000-0003-0783-3585 0009-0006-0937-9012 |
PageCount | 15 |
ParticipantIDs | crossref_primary_10_1109_TMM_2024_3385183 ieee_primary_10492503 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20240000 2024-00-00 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 20240000 |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on multimedia |
PublicationTitleAbbrev | TMM |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 ref12 ref15 ref14 ref10 ref17 ref16 ref19 ref18 Simonyan (ref34) 2015 ref51 ref50 ref46 ref45 ref47 ref42 ref41 ref44 ref43 Gionis (ref28) 1999 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref37 ref36 ref31 ref30 ref2 ref1 ref39 ref38 Over (ref33) 2014 ref24 ref23 ref26 ref25 ref20 Wang (ref48) 2023; 206 Weiss (ref29) 2008; 21 ref22 ref21 Algur (ref32) 2016 ref27 Liu (ref11) 2011 |
References_xml | – ident: ref35 doi: 10.1109/TMM.2016.2645404 – ident: ref36 doi: 10.1109/TMM.2018.2890362 – ident: ref19 doi: 10.1109/TCSVT.2020.2974768 – ident: ref17 doi: 10.1109/CVPR46437.2021.01334 – ident: ref13 doi: 10.1109/TPAMI.2012.48 – ident: ref10 doi: 10.1109/TMM.2019.2946096 – ident: ref39 doi: 10.48550/arXiv.1810.04805 – ident: ref50 doi: 10.24963/ijcai.2017/437 – ident: ref4 doi: 10.1109/TMM.2016.2515990 – ident: ref37 doi: 10.1109/TMM.2020.2978593 – ident: ref3 doi: 10.1109/TIP.2023.3278474 – ident: ref47 doi: 10.1007/978-3-031-19781-9_11 – ident: ref43 doi: 10.1609/aaai.v37i3.25373 – ident: ref20 doi: 10.1109/TIP.2019.2940693 – ident: ref46 doi: 10.1109/TPDS.2020.2975550 – ident: ref49 doi: 10.1109/TKDE.2016.2562624 – volume: 21 start-page: 1753 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2008 ident: ref29 article-title: Spectral hashing – ident: ref7 doi: 10.1109/TPAMI.2012.193 – ident: ref9 doi: 10.1109/TKDE.2012.76 – ident: ref23 doi: 10.1109/TIP.2018.2889269 – ident: ref5 doi: 10.1109/TIP.2018.2814344 – ident: ref41 doi: 10.1016/j.neucom.2022.06.067 – ident: ref42 doi: 10.1007/s11263-019-01166-4 – start-page: 1 volume-title: Proc. 28th Int. Conf. Int. Conf. Mach. Learn. year: 2011 ident: ref11 article-title: Hashing with graphs – ident: ref15 doi: 10.1145/2072298.2072354 – year: 2015 ident: ref34 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref18 doi: 10.1109/ICCV.2013.282 – year: 2016 ident: ref32 article-title: Video key frame extraction using entropy value as global and local feature – ident: ref12 doi: 10.1109/TCSVT.2021.3093258 – ident: ref16 doi: 10.1145/2964284.2964308 – ident: ref2 doi: 10.1109/TIP.2019.2940683 – ident: ref45 doi: 10.1145/2812802 – ident: ref21 doi: 10.1109/TNNLS.2020.2997020 – ident: ref8 doi: 10.1109/TCDS.2019.2963339 – ident: ref24 doi: 10.1016/j.neunet.2005.06.042 – ident: ref6 doi: 10.1109/TIP.2018.2882155 – ident: ref26 doi: 10.1109/TCYB.2019.2923756 – start-page: 518 volume-title: Proc. 25th Int. Conf. Very Large Data Bases year: 1999 ident: ref28 article-title: Similarity search in high dimensions via hashing – ident: ref25 doi: 10.1109/TMM.2021.3070127 – volume: 206 start-page: 6722 volume-title: Proc. Int. Conf. Artif. Intell. Statist. year: 2023 ident: ref48 article-title: Uncertainty-aware unsupervised video hashing – ident: ref22 doi: 10.1109/TCSVT.2020.3001583 – ident: ref31 doi: 10.1145/3532624 – ident: ref27 doi: 10.1109/tcyb.2023.3269756 – ident: ref40 doi: 10.1109/TIP.2019.2948472 – ident: ref51 doi: 10.1109/TMM.2020.2994509 – ident: ref14 doi: 10.1109/TMM.2016.2610324 – ident: ref1 doi: 10.1109/TMM.2016.2557059 – start-page: 1 volume-title: Proc. Trecvid Workshop Participants Notebook Papers year: 2014 ident: ref33 article-title: TRECVID - an overview of the goals, tasks, data, evaluation mechanisms and metrics – ident: ref38 doi: 10.48550/ARXIV.1706.03762 – ident: ref44 doi: 10.1109/TPAMI.2017.2670560 – ident: ref30 doi: 10.1109/CVPR.2015.7298598 |
SSID | ssj0014507 |
Score | 2.4069889 |
Snippet | Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos.... |
SourceID | crossref ieee |
SourceType | Index Database Publisher |
StartPage | 9021 |
SubjectTerms | Hash functions Long short term memory Perturbation methods Robustness Self-supervise Sensitivity Training transformer Transformers video hashing video retrieval |
Title | Self-Supervised Temporal Sensitive Hashing for Video Retrieval |
URI | https://ieeexplore.ieee.org/document/10492503 |
Volume | 26 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akx6s1or1RQ5ePGRNN7vb5CKIWIrQHmwrvS15LYilLbp78dc7SXalCoK3ZUkgzCSZbzIz3yB0bbhmcVpIwg0ct0TyggihBBEarJ-UzAjl6p3Hk2w0T54W6aIuVve1MNZan3xmI_fpY_lmrSv3VAYn3FHpOW7PXfDcQrHWd8ggSX1tNNgjSgQ4Mk1Mkorb2XgMnmCcROCPpX3OftigraYq3qYM22jSrCakkrxFVaki_fmLqPHfyz1EBzW6xPdhOxyhHbvqoHbTuQHXB7mD9rdoCI_R3dQuCzKtNu7e-LAGzwJf1RJPXXq7uxDxKDRdwoBx8cursWv87HtxwUbtovnwcfYwInVfBaLBASkJgB5mKVcFLWJmB4AZpUoVNTzNVJwpLkwhpZaGAVyQJhvY2JGwi346kDRWQrMT1FqtV_YUYaZNIgAjZQaAV6GlTDiMYVqyfkyN7vfQTSPpfBPoM3LvdlCRg1Zyp5W81koPdZ0Mt8YF8Z398f8c7bnp4T3kArXK98peAkIo1ZXfGV_Oz7cu |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA8yD-rB6Zw4P3Pw4iG1bdouuQgijqrrDq6T3Uq-CuLYhrYX_3pf2k6mIHgrJYTwXpL3e3nv_R5Cl5op6oe5IEzDcQsEywnnkhOuwPoJQTWXtt45GUXxJHichtOmWL2qhTHGVMlnxrGfVSxfL1Rpn8rghFsqPcvtuQmGP_Tqcq3voEEQVtXRYJFcwsGVWUUlXX6dJgn4gn7ggEcWeoz-sEJrbVUqqzJoo9FqPXUyyZtTFtJRn7-oGv-94D202-BLfFtviH20YeYd1F71bsDNUe6gnTUiwgN0MzaznIzLpb05PozGac1YNcNjm-Bur0Qc122XMKBc_PKqzQI_V924YKt20WRwn97FpOmsQBS4IAUB2EONy2Tu5j41fUCNQobS1SyMpB9JxnUuhBKaAmAQOuob39Kwcy_sC9eXXNFD1Jov5uYIYap0wAElRRqgV66ECBiMoUpQz3e18nroaiXpbFkTaGSV4-HyDLSSWa1kjVZ6qGtluDauFt_xH_8v0FacJsNs-DB6OkHbdqr6deQUtYr30pwBXijkebVLvgCP8bp3 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-Supervised+Temporal+Sensitive+Hashing+for+Video+Retrieval&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Qihua&rft.au=Tian%2C+Xing&rft.au=Ng%2C+Wing+W.+Y.&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=9021&rft.epage=9035&rft_id=info:doi/10.1109%2FTMM.2024.3385183&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3385183 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon |