Evolutionary recurrent neural network for image captioning

Automatic architecture search is efficient to discover novel neural networks while it is mostly employed for pure vision or natural language tasks. However, cross-modality tasks are highly emphasized on the associative mechanisms between visual and language models rather than merely convolutional ne...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 401; pp. 249 - 256
Main Authors	Wang, Hanzhang, Wang, Hanli, Xu, Kaisheng
Format	Journal Article
Language	English
Published	Elsevier B.V 11.08.2020
Subjects	Evolutionary algorithm Image captioning Multimodal learning Evolutionary algorithm Image captioning Multimodal learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automatic architecture search is efficient to discover novel neural networks while it is mostly employed for pure vision or natural language tasks. However, cross-modality tasks are highly emphasized on the associative mechanisms between visual and language models rather than merely convolutional neural network (CNN) or recurrent neural network (RNN) with the best performance. In this work, the intermediary associative connection is approximated to the topological inner structure of RNN cell, which is further evolved by an evolutionary algorithm on the proxy of image captioning task. On the MSCOCO dataset, the proposed algorithm, starting from scratch, discovers more than 100 RNN variants with the performances all above 100 on CIDEr and 31 on BLEU4, and the top performance achieves 101.4 and 32.6 accordingly. Additionally, several unknown interesting patterns as well as many existing powerful structures are found in the generated RNNs. The patterns of operation and connection in the generated architecture are analyzed to understand the language modeling of cross-modality compared with general RNNs.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2020.03.087