Learning to Assemble Neural Module Tree Networks for Visual Grounding

Visual grounding, a task to ground (i.e., localize) natural language in images, essentially requires composite visual reasoning. However, existing methods over-simplify the composite nature of language into a monolithic sentence embedding or a coarse composition of subject-predicate-object triplet....

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE International Conference on Computer Vision pp. 4672 - 4681
Main Authors	Liu, Daqing, Zhang, Hanwang, Zha, Zheng-Jun, Wu, Feng
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2019
Subjects	Artificial neural networks Cognition Grounding Natural languages Task analysis Training Visualization
Online Access	Get full text
ISSN	2380-7504
DOI	10.1109/ICCV.2019.00477

Cover

Loading…

More Information
Metadata