A Visual Question Answering Network Merging High- and Low-Level Semantic Information

Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems Vol. E106.D; no. 5; pp. 581 - 589
Main Authors LI, Huimin, HAN, Dezhi, CHEN, Chongqing, CHANG, Chin-Chen, LI, Kuan-Ching, LI, Dun
Format Journal Article
LanguageEnglish
Published Tokyo The Institute of Electronics, Information and Communication Engineers 01.05.2023
Japan Science and Technology Agency
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.
ISSN:0916-8532
1745-1361
DOI:10.1587/transinf.2022DLP0002