Self-Supervised Temporal Sensitive Hashing for Video Retrieval
Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen...
Saved in:
Published in | IEEE transactions on multimedia Vol. 26; pp. 9021 - 9035 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Self-supervised video hashing methods retrieve large-scale video data without labels by making full use of visual and temporal information in original videos. Existing methods are not robust enough to handle small temporal differences between similar videos, because of the ignoring of future unseen samples on temporal which leads to large generalization errors. At the same time, existing self-supervised methods cannot preserve pairwise similarity information between large-scale unlabeled data efficiently and effectively. Thus, a self-supervised temporal sensitive video hashing (TSVH) is proposed in the paper for video retrieval. The TSVH uses a transformer-based autoencoder network with temporal sensitivity regularization to achieve low sensitivity of local temporal perturbations and preserve information of global temporal sequence. The pairwise similarity between video samples is effectively preserved by applying a hashing-based affinity matrix in the method. Experiments on realistic datasets show that the TSVH outperforms several state-of-the-art methods and classic methods. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2024.3385183 |