Learning to Assemble Neural Module Tree Networks for Visual Grounding

Visual grounding, a task to ground (i.e., localize) natural language in images, essentially requires composite visual reasoning. However, existing methods over-simplify the composite nature of language into a monolithic sentence embedding or a coarse composition of subject-predicate-object triplet....

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Computer Vision pp. 4672 - 4681
Main Authors Liu, Daqing, Zhang, Hanwang, Zha, Zheng-Jun, Wu, Feng
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text
ISSN2380-7504
DOI10.1109/ICCV.2019.00477

Cover

Loading…