SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to low...

Full description

Saved in:
Bibliographic Details
Main Authors Dharur, Sameer, Tendulkar, Purva, Batra, Dhruv, Parikh, Devi, Selvaraju, Ramprasaath R
Format Journal Article
LanguageEnglish
Published 20.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5% points over existing baselines, while also improving visual grounding.
AbstractList Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5% points over existing baselines, while also improving visual grounding.
Author Tendulkar, Purva
Parikh, Devi
Batra, Dhruv
Dharur, Sameer
Selvaraju, Ramprasaath R
Author_xml – sequence: 1
  givenname: Sameer
  surname: Dharur
  fullname: Dharur, Sameer
– sequence: 2
  givenname: Purva
  surname: Tendulkar
  fullname: Tendulkar, Purva
– sequence: 3
  givenname: Dhruv
  surname: Batra
  fullname: Batra, Dhruv
– sequence: 4
  givenname: Devi
  surname: Parikh
  fullname: Parikh, Devi
– sequence: 5
  givenname: Ramprasaath R
  surname: Selvaraju
  fullname: Selvaraju, Ramprasaath R
BackLink https://doi.org/10.48550/arXiv.2010.10038$$DView paper in arXiv
BookMark eNotj1FLwzAUhfOgDzr9AT6ZP9B5m6xJ9W0UnYPKkJW9lpvkVgJbOpJS3L933Xw6nMPHge-e3YQ-EGNPOcwXZVHAC8ZfP84FnIccQJZ3rN5uYpP58MN330v-1TvaJ_7Gqz4MEdPgR-KriM5TGHhNGMOEdn3k68Mx9iO5CU0-DRTs6YHddrhP9PifM9Z8vDfVZ1ZvVutqWWeodJkJkE4spC1AKWtISS2V0NpJ5azEcxMdgBYGnVNCFblBY11nqUSTv0qwcsaer7cXnfYY_QHjqZ202ouW_APNC0kM
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2010.10038
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2010_10038
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a678-203d243c5066cbe63736277d36dc3a3732f0072badd62651babcdfce8ab1930c3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:40:04 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a678-203d243c5066cbe63736277d36dc3a3732f0072badd62651babcdfce8ab1930c3
OpenAccessLink https://arxiv.org/abs/2010.10038
ParticipantIDs arxiv_primary_2010_10038
PublicationCentury 2000
PublicationDate 2020-10-20
PublicationDateYYYYMMDD 2020-10-20
PublicationDate_xml – month: 10
  year: 2020
  text: 2020-10-20
  day: 20
PublicationDecade 2020
PublicationYear 2020
Score 1.780591
SecondaryResourceType preprint
Snippet Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
Title SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
URI https://arxiv.org/abs/2010.10038
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1kxy8Ftuk7aZ7W8TdRcRFrNJbySSpCFKlraL_3kla0YvHJHPJJMx7QyZvAM7RVJmstA2MyChBkRwDlQgehE4sDUWkIuWrLW7T9UN8XSTFCNjPXxjVfD5_9PrA2F74yqvIvV6NYcy5K9labYr-cdJLcQ32v3bEMf3UH5BY7sD2wO7Yoj-OXRjZeg9u7jdNHhBEsMe7BXPNx15aNmdOF6pRrQs3bNX4yquODXKnT4y4JOsTfmuY76rZOnb7tQ_58iq_XAdDE4NAEQ7QJRSGx0InBO0abSpmhBizmRGp0ULRiFdOvBspzFBqkUSoUBvynVRI1CrU4gAm9Wttp8AwURJlJlFjHBtuVFVJm8bu6yuxuio5hKnfevnW61SUziul98rR_0vHsMVdCknhmIcnMOmad3tKONvhmXf2N07dfPM
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SOrT-ing+VQA+Models+%3A+Contrastive+Gradient+Learning+for+Improved+Consistency&rft.au=Dharur%2C+Sameer&rft.au=Tendulkar%2C+Purva&rft.au=Batra%2C+Dhruv&rft.au=Parikh%2C+Devi&rft.date=2020-10-20&rft_id=info:doi/10.48550%2Farxiv.2010.10038&rft.externalDocID=2010_10038