Self-Supervised Temporal Sensitive Hashing for Video Retrieval

Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 26; pp. 9021 - 9035
Main Authors	Li, Qihua, Tian, Xing, Ng, Wing W. Y.
Format	Journal Article
Language	English
Published	IEEE 2024
Subjects	Hash functions Long short term memory Perturbation methods Robustness Self-supervise Sensitivity Training transformer Transformers video hashing video retrieval
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2024.3385183