Scalable Video Event Retrieval by Visual State Binary Embedding

With the exponential increase of media data on the web, fast media retrieval is becoming a significant research topic in multimedia content analysis. Among the variety of techniques, learning binary embedding (hashing) functions is one of the most popular approaches that can achieve scalable informa...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 18; no. 8; pp. 1590 - 1603
Main Authors	Yu, Litao, Huang, Zi, Cao, Jiewei, Shen, Heng Tao
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.08.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Event detection Hashing Hidden Markov models Mathematical analysis Mathematical models Media Multimedia Multimedia communication Retrieval Semantics Streaming media Training video event retrieval Visual visual state Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the exponential increase of media data on the web, fast media retrieval is becoming a significant research topic in multimedia content analysis. Among the variety of techniques, learning binary embedding (hashing) functions is one of the most popular approaches that can achieve scalable information retrieval in large databases, and it is mainly used in the near-duplicate multimedia search. However, till now most hashing methods are specifically designed for near-duplicate retrieval at the visual level rather than the semantic level. In this paper, we propose a visual state binary embedding (VSBE) model to encode the video frames, which can preserve the essential semantic information in binary matrices, to facilitate fast video event retrieval in unconstrained cases. Compared with other video binary embedding models, one advantage of our proposed VSBE model is that it only needs a limited number of key frames from the training videos for hash function training, so the computational complexity is much lower in the training phase. At the same time, we apply the pairwise constraints generated from the visual states to sketch the local properties of the events at the semantic level, so accuracy is also ensured. We conducted extensive experiments on the challenging TRECVID MED dataset, and have proved the superiority of our proposed VSBE model.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2016.2557059