Self-Supervised Temporal Sensitive Hashing for Video Retrieval

Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 26; pp. 9021 - 9035
Main Authors Li, Qihua, Tian, Xing, Ng, Wing W. Y.
Format Journal Article
LanguageEnglish
Published IEEE 2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3385183