Video retrieval system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the vi...

Full description

Saved in:

Bibliographic Details
Main Authors	Min, Renqiang, Pu, Yunchen
Format	Patent
Language	English
Published	03.09.2019
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC COMMUNICATION TECHNIQUE ELECTRICITY HANDLING RECORD CARRIERS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.
Bibliography:	Application Number: US201715794802