Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
•Multiple game-inspired novel interfaces for collecting human attention maps of where humans choose to look to answer questions from the large-scale VQA dataset (Antol et al., 2015).•Qualitative and quantitative comparison of the maps generated by state-of-the-art attention-based VQA models (Yang et...
Saved in:
Published in | Computer vision and image understanding Vol. 163; pp. 90 - 100 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.10.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Multiple game-inspired novel interfaces for collecting human attention maps of where humans choose to look to answer questions from the large-scale VQA dataset (Antol et al., 2015).•Qualitative and quantitative comparison of the maps generated by state-of-the-art attention-based VQA models (Yang et al., 2016; Lu et al., 2016) and a task-independent saliency baseline. (Judd et al., 2009) against our human attention maps through visualizations and rank-order correlation•VQA model trained with explicit supervision for attention using our human attention maps as ground truth.
We conduct large-scale studies on ‘human attention’ in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans. Finally, we train VQA models with explicit attention supervision, and find that it improves VQA performance. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2017.10.001 |