Logical Implications for Visual Question Answering Consistency
Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
16.03.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Despite considerable recent progress in Visual Question Answering (VQA)
models, inconsistent or contradictory answers continue to cast doubt on their
true reasoning capabilities. However, most proposed methods use indirect
strategies or strong assumptions on pairs of questions and answers to enforce
model consistency. Instead, we propose a novel strategy intended to improve
model performance by directly reducing logical inconsistencies. To do this, we
introduce a new consistency loss term that can be used by a wide range of the
VQA models and which relies on knowing the logical relation between pairs of
questions and answers. While such information is typically not available in VQA
datasets, we propose to infer these logical relations using a dedicated
language model and use these in our proposed consistency loss function. We
conduct extensive experiments on the VQA Introspect and DME datasets and show
that our method brings improvements to state-of-the-art VQA models, while being
robust across different architectures and settings. |
---|---|
DOI: | 10.48550/arxiv.2303.09427 |