Suppressing Biased Samples for Robust VQA

Most existing visual question answering (VQA) models strongly rely on language bias to answer questions, i.e., they always tend to fit question-answer pairs on the train split and perform poorly on the test spilt when the answer distributions are different. This behavior makes them hard to be applie...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 24; pp. 3405 - 3415
Main Authors Ouyang, Ninglin, Huang, Qingbao, Li, Pijian, Cai, Yi, Liu, Bin, Leung, Ho-fung, Li, Qing
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Most existing visual question answering (VQA) models strongly rely on language bias to answer questions, i.e., they always tend to fit question-answer pairs on the train split and perform poorly on the test spilt when the answer distributions are different. This behavior makes them hard to be applied in real scenarios. To reduce the language biases, previous studies mainly integrate modules to overcome language priors (ensemble-based methods) or generate additional training data to balance dataset biases (data-balanced methods). However, all the existing ensemble-based methods drop their accuracies on the VQA v2 dataset, while data-balanced methods may introduce new biases and cannot guarantee the quality of the generated data. In this paper, we propose a model-agnostic training scheme called Suppressing Biased Samples (SBS) to overcome language priors. SBS consists of two collaborative parts, i.e., a Data Classifier Module to divide the dataset into biased samples and unbiased samples by utilizing the similarity in the semantic space, and a Bias Penalty Module to suppress the biased samples to weaken their influence. As a new way of balancing data to address language bias, SBS overcomes the shortcomings of previous data-balanced methods. Experimental results show that our method can be merged into other bias-reduction methods and achieves a new state-of-the-art performance on the commonly used VQA-CP v2 dataset.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2021.3097502