SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to low...

Full description

Saved in:

Bibliographic Details
Main Authors	Dharur, Sameer, Tendulkar, Purva, Batra, Dhruv, Parikh, Devi, Selvaraju, Ramprasaath R
Format	Journal Article
Language	English
Published	20.10.2020
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online Access	Get full text

Cover

Loading…

Abstract	Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5% points over existing baselines, while also improving visual grounding.
AbstractList	Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5% points over existing baselines, while also improving visual grounding.
Author	Tendulkar, Purva Parikh, Devi Batra, Dhruv Dharur, Sameer Selvaraju, Ramprasaath R
Author_xml	– sequence: 1 givenname: Sameer surname: Dharur fullname: Dharur, Sameer – sequence: 2 givenname: Purva surname: Tendulkar fullname: Tendulkar, Purva – sequence: 3 givenname: Dhruv surname: Batra fullname: Batra, Dhruv – sequence: 4 givenname: Devi surname: Parikh fullname: Parikh, Devi – sequence: 5 givenname: Ramprasaath R surname: Selvaraju fullname: Selvaraju, Ramprasaath R
BackLink	https://doi.org/10.48550/arXiv.2010.10038$$DView paper in arXiv
BookMark	eNotj1FLwzAUhfOgDzr9AT6ZP9B5m6xJ9W0UnYPKkJW9lpvkVgJbOpJS3L933Xw6nMPHge-e3YQ-EGNPOcwXZVHAC8ZfP84FnIccQJZ3rN5uYpP58MN330v-1TvaJ_7Gqz4MEdPgR-KriM5TGHhNGMOEdn3k68Mx9iO5CU0-DRTs6YHddrhP9PifM9Z8vDfVZ1ZvVutqWWeodJkJkE4spC1AKWtISS2V0NpJ5azEcxMdgBYGnVNCFblBY11nqUSTv0qwcsaer7cXnfYY_QHjqZ202ouW_APNC0kM
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2010.10038
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2010_10038
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a678-203d243c5066cbe63736277d36dc3a3732f0072badd62651babcdfce8ab1930c3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:40:04 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a678-203d243c5066cbe63736277d36dc3a3732f0072badd62651babcdfce8ab1930c3
OpenAccessLink	https://arxiv.org/abs/2010.10038
ParticipantIDs	arxiv_primary_2010_10038
PublicationCentury	2000
PublicationDate	2020-10-20
PublicationDateYYYYMMDD	2020-10-20
PublicationDate_xml	– month: 10 year: 2020 text: 2020-10-20 day: 20
PublicationDecade	2020
PublicationYear	2020
Score	1.780591
SecondaryResourceType	preprint
Snippet	Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Title	SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
URI	https://arxiv.org/abs/2010.10038
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1kxy8Ftuk7aZ7W8TdRcRFrNJbySSpCFKlraL_3kla0YvHJHPJJMx7QyZvAM7RVJmstA2MyChBkRwDlQgehE4sDUWkIuWrLW7T9UN8XSTFCNjPXxjVfD5_9PrA2F74yqvIvV6NYcy5K9labYr-cdJLcQ32v3bEMf3UH5BY7sD2wO7Yoj-OXRjZeg9u7jdNHhBEsMe7BXPNx15aNmdOF6pRrQs3bNX4yquODXKnT4y4JOsTfmuY76rZOnb7tQ_58iq_XAdDE4NAEQ7QJRSGx0InBO0abSpmhBizmRGp0ULRiFdOvBspzFBqkUSoUBvynVRI1CrU4gAm9Wttp8AwURJlJlFjHBtuVFVJm8bu6yuxuio5hKnfevnW61SUziul98rR_0vHsMVdCknhmIcnMOmad3tKONvhmXf2N07dfPM
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SOrT-ing+VQA+Models+%3A+Contrastive+Gradient+Learning+for+Improved+Consistency&rft.au=Dharur%2C+Sameer&rft.au=Tendulkar%2C+Purva&rft.au=Batra%2C+Dhruv&rft.au=Parikh%2C+Devi&rft.date=2020-10-20&rft_id=info:doi/10.48550%2Farxiv.2010.10038&rft.externalDocID=2010_10038