关系挖掘驱动的视频描述自动生成
Video description has received increased interest in the field of computer vision. The process of generating video descriptions needs the technology of natural language processing, and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description word...
Saved in:
Published in | Nanjing Xinxi Gongcheng Daxue Xuebao Vol. 9; no. 6; pp. 642 - 649 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | Chinese |
Published |
Nanjing
Nanjing University of Information Science & Technology
01.12.2017
中国科学院自动化研究所 模式识别国家重点实验室,北京,100190 中国科学院大学,北京,100049 |
Subjects | |
Online Access | Get full text |
ISSN | 1674-7070 |
DOI | 10.13878/j.cnki.jnuist.2017.06.008 |
Cover
Loading…
Summary: | Video description has received increased interest in the field of computer vision. The process of generating video descriptions needs the technology of natural language processing, and the capacity to allow both the lengths of input (sequence of video frames) and output (sequence of description words) to be variable. To this end, this paper uses the recent advances in machine translation ,and designs a two-layer LSTM (Long Short-Term Memory) model based on the encoder-decoder architecture. Since the deep neural network can learn appropriate representation of input data, we extract the feature vectors of the video frames by convolution neural network (CNN) and take them as the input sequence of the LSTM model. Finally, we compare the influences of different feature extraction methods on the LSTM video description model. The results show that the model in this paper is able to learn to transform sequence of knowledge representation to natural language. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1674-7070 |
DOI: | 10.13878/j.cnki.jnuist.2017.06.008 |