Deep Residual Weight-Sharing Attention Network With Low-Rank Attention for Visual Question Answering

The attention-based networks have become prevailing recently in visual question answering (VQA) due to their high performances. However, the extensive memory consumption of attention-based models poses excessive-high demand for the implementation equipment, raising concerns about their future applic...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 25; pp. 4282 - 4295
Main Authors Qin, Bosheng, Hu, Haoji, Zhuang, Yueting
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The attention-based networks have become prevailing recently in visual question answering (VQA) due to their high performances. However, the extensive memory consumption of attention-based models poses excessive-high demand for the implementation equipment, raising concerns about their future application scenarios. Therefore, designing an efficient and lightweight VQA model is central to expanding possible application areas. Our work presents a novel lightweight attention-based VQA model, namely residual weight-sharing attention network (RWSAN), consisting of residual weight-sharing attention (RWSA) layers cascaded in depth. Each RWSA layer models the textual representation with self residual weight-sharing attention (SRWSA) and captures question features and question-image interactions with self-guided residual weight-sharing attention (SGRWSA). Inside each RWSA layer, the proposed low-rank attention (LRA) units perform residual learning with learned connection patterns and shared parameters, and every stacked RWSA layer also uses the same parameters. Extensive ablation experiments with quantitative and qualitative analysis are conducted to illustrate the effectiveness and generality of RWSA. Experiments on VQA-v2, GQA, and CLEVR datasets show that the RWSAN achieves competitive performance with much fewer parameters over the state-of-the-art methods. We release our code at https://github.com/BrightQin/RWSAN .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2022.3173131