Self-Supervised Temporal Sensitive Hashing for Video Retrieval

Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 26; pp. 9021 - 9035
Main Authors Li, Qihua, Tian, Xing, Ng, Wing W. Y.
Format Journal Article
LanguageEnglish
Published IEEE 2024
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2024.3385183

Cover

Loading…
Abstract Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
AbstractList Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
Author Ng, Wing W. Y.
Li, Qihua
Tian, Xing
Author_xml – sequence: 1
  givenname: Qihua
  orcidid: 0009-0006-0937-9012
  surname: Li
  fullname: Li, Qihua
  email: liqihua1999@163.com
  organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
– sequence: 2
  givenname: Xing
  orcidid: 0000-0002-7546-1018
  surname: Tian
  fullname: Tian, Xing
  email: shawntian123@gmail.com
  organization: School of Artificial Intelligence, South China Normal University, Guangzhou, Guangdong, China
– sequence: 3
  givenname: Wing W. Y.
  orcidid: 0000-0003-0783-3585
  surname: Ng
  fullname: Ng, Wing W. Y.
  email: wingng@ieee.org
  organization: Guangdong Provincial Key Laboratory of Computational Intelligence and Cyberspace Information, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
BookMark eNpNj0FLwzAYhoNMcJvePXjoH2j9kjRNchFk6CZsCLZ6LWn7RSNdW5I68N_bsR08ve_hfV54FmTW9R0SckshoRT0fbHbJQxYmnCuBFX8gsypTmkMIOVs6oJBrBmFK7II4RuApgLknDzk2No4_xnQH1zAJipwP_TetFGOXXCjO2C0MeHLdZ-R7X304RrsozccvcODaa_JpTVtwJtzLsn781Ox2sTb1_XL6nEb14zKMRYgOIKqLFjGUYICU4kKGiWyimWV0o01pjYNB0VNk0lkmguhqZAGWKVrviRw-q19H4JHWw7e7Y3_LSmUR_9y8i-P_uXZf0LuTohDxH_zVDMBnP8BVGtYVw
CODEN ITMUF8
Cites_doi 10.1109/TMM.2016.2645404
10.1109/TMM.2018.2890362
10.1109/TCSVT.2020.2974768
10.1109/CVPR46437.2021.01334
10.1109/TPAMI.2012.48
10.1109/TMM.2019.2946096
10.48550/arXiv.1810.04805
10.24963/ijcai.2017/437
10.1109/TMM.2016.2515990
10.1109/TMM.2020.2978593
10.1109/TIP.2023.3278474
10.1007/978-3-031-19781-9_11
10.1609/aaai.v37i3.25373
10.1109/TIP.2019.2940693
10.1109/TPDS.2020.2975550
10.1109/TKDE.2016.2562624
10.1109/TPAMI.2012.193
10.1109/TKDE.2012.76
10.1109/TIP.2018.2889269
10.1109/TIP.2018.2814344
10.1016/j.neucom.2022.06.067
10.1007/s11263-019-01166-4
10.1145/2072298.2072354
10.1109/ICCV.2013.282
10.1109/TCSVT.2021.3093258
10.1145/2964284.2964308
10.1109/TIP.2019.2940683
10.1145/2812802
10.1109/TNNLS.2020.2997020
10.1109/TCDS.2019.2963339
10.1016/j.neunet.2005.06.042
10.1109/TIP.2018.2882155
10.1109/TCYB.2019.2923756
10.1109/TMM.2021.3070127
10.1109/TCSVT.2020.3001583
10.1145/3532624
10.1109/tcyb.2023.3269756
10.1109/TIP.2019.2948472
10.1109/TMM.2020.2994509
10.1109/TMM.2016.2610324
10.1109/TMM.2016.2557059
10.48550/ARXIV.1706.03762
10.1109/TPAMI.2017.2670560
10.1109/CVPR.2015.7298598
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TMM.2024.3385183
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0077
EndPage 9035
ExternalDocumentID 10_1109_TMM_2024_3385183
10492503
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62202175
  funderid: 10.13039/501100001809
– fundername: Basic and Applied Basic Research Foundation of Guangdong Province; Guangdong Basic and Applied Basic Research Foundation
  grantid: 2024A1515011896
  funderid: 10.13039/501100021171
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
VH1
ZY4
AAYXX
CITATION
ID FETCH-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3
IEDL.DBID RIE
ISSN 1520-9210
IngestDate Tue Jul 01 01:54:43 EDT 2025
Wed Aug 27 02:28:18 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c217t-5053e08bf0f23e7080ab5b0d856b26b89dfaacad3081ad67e293559157a02b9c3
ORCID 0000-0002-7546-1018
0000-0003-0783-3585
0009-0006-0937-9012
PageCount 15
ParticipantIDs crossref_primary_10_1109_TMM_2024_3385183
ieee_primary_10492503
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240000
2024-00-00
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 20240000
PublicationDecade 2020
PublicationTitle IEEE transactions on multimedia
PublicationTitleAbbrev TMM
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
ref15
ref14
ref10
ref17
ref16
ref19
ref18
Simonyan (ref34) 2015
ref51
ref50
ref46
ref45
ref47
ref42
ref41
ref44
ref43
Gionis (ref28) 1999
ref49
ref8
ref7
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref37
ref36
ref31
ref30
ref2
ref1
ref39
ref38
Over (ref33) 2014
ref24
ref23
ref26
ref25
ref20
Wang (ref48) 2023; 206
Weiss (ref29) 2008; 21
ref22
ref21
Algur (ref32) 2016
ref27
Liu (ref11) 2011
References_xml – ident: ref35
  doi: 10.1109/TMM.2016.2645404
– ident: ref36
  doi: 10.1109/TMM.2018.2890362
– ident: ref19
  doi: 10.1109/TCSVT.2020.2974768
– ident: ref17
  doi: 10.1109/CVPR46437.2021.01334
– ident: ref13
  doi: 10.1109/TPAMI.2012.48
– ident: ref10
  doi: 10.1109/TMM.2019.2946096
– ident: ref39
  doi: 10.48550/arXiv.1810.04805
– ident: ref50
  doi: 10.24963/ijcai.2017/437
– ident: ref4
  doi: 10.1109/TMM.2016.2515990
– ident: ref37
  doi: 10.1109/TMM.2020.2978593
– ident: ref3
  doi: 10.1109/TIP.2023.3278474
– ident: ref47
  doi: 10.1007/978-3-031-19781-9_11
– ident: ref43
  doi: 10.1609/aaai.v37i3.25373
– ident: ref20
  doi: 10.1109/TIP.2019.2940693
– ident: ref46
  doi: 10.1109/TPDS.2020.2975550
– ident: ref49
  doi: 10.1109/TKDE.2016.2562624
– volume: 21
  start-page: 1753
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2008
  ident: ref29
  article-title: Spectral hashing
– ident: ref7
  doi: 10.1109/TPAMI.2012.193
– ident: ref9
  doi: 10.1109/TKDE.2012.76
– ident: ref23
  doi: 10.1109/TIP.2018.2889269
– ident: ref5
  doi: 10.1109/TIP.2018.2814344
– ident: ref41
  doi: 10.1016/j.neucom.2022.06.067
– ident: ref42
  doi: 10.1007/s11263-019-01166-4
– start-page: 1
  volume-title: Proc. 28th Int. Conf. Int. Conf. Mach. Learn.
  year: 2011
  ident: ref11
  article-title: Hashing with graphs
– ident: ref15
  doi: 10.1145/2072298.2072354
– year: 2015
  ident: ref34
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref18
  doi: 10.1109/ICCV.2013.282
– year: 2016
  ident: ref32
  article-title: Video key frame extraction using entropy value as global and local feature
– ident: ref12
  doi: 10.1109/TCSVT.2021.3093258
– ident: ref16
  doi: 10.1145/2964284.2964308
– ident: ref2
  doi: 10.1109/TIP.2019.2940683
– ident: ref45
  doi: 10.1145/2812802
– ident: ref21
  doi: 10.1109/TNNLS.2020.2997020
– ident: ref8
  doi: 10.1109/TCDS.2019.2963339
– ident: ref24
  doi: 10.1016/j.neunet.2005.06.042
– ident: ref6
  doi: 10.1109/TIP.2018.2882155
– ident: ref26
  doi: 10.1109/TCYB.2019.2923756
– start-page: 518
  volume-title: Proc. 25th Int. Conf. Very Large Data Bases
  year: 1999
  ident: ref28
  article-title: Similarity search in high dimensions via hashing
– ident: ref25
  doi: 10.1109/TMM.2021.3070127
– volume: 206
  start-page: 6722
  volume-title: Proc. Int. Conf. Artif. Intell. Statist.
  year: 2023
  ident: ref48
  article-title: Uncertainty-aware unsupervised video hashing
– ident: ref22
  doi: 10.1109/TCSVT.2020.3001583
– ident: ref31
  doi: 10.1145/3532624
– ident: ref27
  doi: 10.1109/tcyb.2023.3269756
– ident: ref40
  doi: 10.1109/TIP.2019.2948472
– ident: ref51
  doi: 10.1109/TMM.2020.2994509
– ident: ref14
  doi: 10.1109/TMM.2016.2610324
– ident: ref1
  doi: 10.1109/TMM.2016.2557059
– start-page: 1
  volume-title: Proc. Trecvid Workshop Participants Notebook Papers
  year: 2014
  ident: ref33
  article-title: TRECVID - an overview of the goals, tasks, data, evaluation mechanisms and metrics
– ident: ref38
  doi: 10.48550/ARXIV.1706.03762
– ident: ref44
  doi: 10.1109/TPAMI.2017.2670560
– ident: ref30
  doi: 10.1109/CVPR.2015.7298598
SSID ssj0014507
Score 2.4069889
Snippet Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos....
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 9021
SubjectTerms Hash functions
Long short term memory
Perturbation methods
Robustness
Self-supervise
Sensitivity
Training
transformer
Transformers
video hashing
video retrieval
Title Self-Supervised Temporal Sensitive Hashing for Video Retrieval
URI https://ieeexplore.ieee.org/document/10492503
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akx6s1or1RQ5ePGRNN7vb5CKIWIrQHmwrvS15LYilLbp78dc7SXalCoK3ZUkgzCSZbzIz3yB0bbhmcVpIwg0ct0TyggihBBEarJ-UzAjl6p3Hk2w0T54W6aIuVve1MNZan3xmI_fpY_lmrSv3VAYn3FHpOW7PXfDcQrHWd8ggSX1tNNgjSgQ4Mk1Mkorb2XgMnmCcROCPpX3OftigraYq3qYM22jSrCakkrxFVaki_fmLqPHfyz1EBzW6xPdhOxyhHbvqoHbTuQHXB7mD9rdoCI_R3dQuCzKtNu7e-LAGzwJf1RJPXXq7uxDxKDRdwoBx8cursWv87HtxwUbtovnwcfYwInVfBaLBASkJgB5mKVcFLWJmB4AZpUoVNTzNVJwpLkwhpZaGAVyQJhvY2JGwi346kDRWQrMT1FqtV_YUYaZNIgAjZQaAV6GlTDiMYVqyfkyN7vfQTSPpfBPoM3LvdlCRg1Zyp5W81koPdZ0Mt8YF8Z398f8c7bnp4T3kArXK98peAkIo1ZXfGV_Oz7cu
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA8yD-rB6Zw4P3Pw4iG1bdouuQgijqrrDq6T3Uq-CuLYhrYX_3pf2k6mIHgrJYTwXpL3e3nv_R5Cl5op6oe5IEzDcQsEywnnkhOuwPoJQTWXtt45GUXxJHichtOmWL2qhTHGVMlnxrGfVSxfL1Rpn8rghFsqPcvtuQmGP_Tqcq3voEEQVtXRYJFcwsGVWUUlXX6dJgn4gn7ggEcWeoz-sEJrbVUqqzJoo9FqPXUyyZtTFtJRn7-oGv-94D202-BLfFtviH20YeYd1F71bsDNUe6gnTUiwgN0MzaznIzLpb05PozGac1YNcNjm-Bur0Qc122XMKBc_PKqzQI_V924YKt20WRwn97FpOmsQBS4IAUB2EONy2Tu5j41fUCNQobS1SyMpB9JxnUuhBKaAmAQOuob39Kwcy_sC9eXXNFD1Jov5uYIYap0wAElRRqgV66ECBiMoUpQz3e18nroaiXpbFkTaGSV4-HyDLSSWa1kjVZ6qGtluDauFt_xH_8v0FacJsNs-DB6OkHbdqr6deQUtYr30pwBXijkebVLvgCP8bp3
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-Supervised+Temporal+Sensitive+Hashing+for+Video+Retrieval&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Qihua&rft.au=Tian%2C+Xing&rft.au=Ng%2C+Wing+W.+Y.&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=9021&rft.epage=9035&rft_id=info:doi/10.1109%2FTMM.2024.3385183&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3385183
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon