Structure-Aware Procedural Text Generation From an Image Sequence

It is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the pro...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 9; pp. 2125 - 2141
Main Authors Nishimura, Taichi, Hashimoto, Atsushi, Ushiku, Yoshitaka, Kameko, Hirotaka, Yamakata, Yoko, Mori, Shinsuke
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the procedural text is its dependency of the context, which is the merging operations of materials and can be represented by a graph or tree structure. This paper aims to investigate the impact of explicitly introducing such a structure on the vision and language task of procedural text generation from an image sequence. To this end, we propose (1) a new dataset, which extends a definition of a tree structure merging tree to a vision and language version and (2) a novel structure-aware procedural text generation model, which learns the context dependency efficiently. Experimental results show that the proposed method can boost the performance of traditional versatile methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3043452