关系挖掘驱动的视频描述自动生成

Video description has received increased interest in the field of computer vision. The process of generating video descriptions needs the technology of natural language processing, and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description word...

Full description

Saved in:
Bibliographic Details
Published inNanjing Xinxi Gongcheng Daxue Xuebao Vol. 9; no. 6; pp. 642 - 649
Main Authors Huang, Yi, Bao, Bingkun, Xu, Changsheng
Format Journal Article
LanguageChinese
Published Nanjing Nanjing University of Information Science & Technology 01.12.2017
中国科学院自动化研究所 模式识别国家重点实验室,北京,100190
中国科学院大学,北京,100049
Subjects
Online AccessGet full text
ISSN1674-7070
DOI10.13878/j.cnki.jnuist.2017.06.008

Cover

Loading…
More Information
Summary:Video description has received increased interest in the field of computer vision. The process of generating video descriptions needs the technology of natural language processing, and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description words) to be variable. To this end, this paper uses the recent advances in machine translation ,and designs a two-layer LSTM (Long Short-Term Memory) model based on the encoder-decoder architecture. Since the deep neural network can learn appropriate representation of input data, we extract the feature vectors of the video frames by convolution neural network (CNN) and take them as the input sequence of the LSTM model. Finally, we compare the influences of different feature extraction methods on the LSTM video description model. The results show that the model in this paper is able to learn to transform sequence of knowledge representation to natural language.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1674-7070
DOI:10.13878/j.cnki.jnuist.2017.06.008