Deep Residual Weight-Sharing Attention Network With Low-Rank Attention for Visual Question Answering
The attention-based networks have become prevailing recently in visual question answering (VQA) due to their high performances. However, the extensive memory consumption of attention-based models poses excessive-high demand for the implementation equipment, raising concerns about their future applic...
Saved in:
Published in | IEEE transactions on multimedia Vol. 25; pp. 4282 - 4295 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The attention-based networks have become prevailing recently in visual question answering (VQA) due to their high performances. However, the extensive memory consumption of attention-based models poses excessive-high demand for the implementation equipment, raising concerns about their future application scenarios. Therefore, designing an efficient and lightweight VQA model is central to expanding possible application areas. Our work presents a novel lightweight attention-based VQA model, namely residual weight-sharing attention network (RWSAN), consisting of residual weight-sharing attention (RWSA) layers cascaded in depth. Each RWSA layer models the textual representation with self residual weight-sharing attention (SRWSA) and captures question features and question-image interactions with self-guided residual weight-sharing attention (SGRWSA). Inside each RWSA layer, the proposed low-rank attention (LRA) units perform residual learning with learned connection patterns and shared parameters, and every stacked RWSA layer also uses the same parameters. Extensive ablation experiments with quantitative and qualitative analysis are conducted to illustrate the effectiveness and generality of RWSA. Experiments on VQA-v2, GQA, and CLEVR datasets show that the RWSAN achieves competitive performance with much fewer parameters over the state-of-the-art methods. We release our code at https://github.com/BrightQin/RWSAN . |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2022.3173131 |