VizWiz Grand Challenge: Answering Visual Questions from Blind People

The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3608 - 3617
Main Authors Gurari, Danna, Li, Qing, Stangl, Abigale J., Guo, Anhong, Lin, Chi, Grauman, Kristen, Luo, Jiebo, Bigham, Jeffrey P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. VizWiz differs from the many existing VQA datasets because (1) images are captured by blind photographers and so are often poor quality, (2) questions are spoken and so are more conversational, and (3) often visual questions cannot be answered. Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset. We introduce this dataset to encourage a larger community to develop more generalized algorithms that can assist blind people.
AbstractList The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. VizWiz differs from the many existing VQA datasets because (1) images are captured by blind photographers and so are often poor quality, (2) questions are spoken and so are more conversational, and (3) often visual questions cannot be answered. Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset. We introduce this dataset to encourage a larger community to develop more generalized algorithms that can assist blind people.
Author Luo, Jiebo
Bigham, Jeffrey P.
Li, Qing
Lin, Chi
Gurari, Danna
Grauman, Kristen
Guo, Anhong
Stangl, Abigale J.
Author_xml – sequence: 1
  givenname: Danna
  surname: Gurari
  fullname: Gurari, Danna
– sequence: 2
  givenname: Qing
  surname: Li
  fullname: Li, Qing
– sequence: 3
  givenname: Abigale J.
  surname: Stangl
  fullname: Stangl, Abigale J.
– sequence: 4
  givenname: Anhong
  surname: Guo
  fullname: Guo, Anhong
– sequence: 5
  givenname: Chi
  surname: Lin
  fullname: Lin, Chi
– sequence: 6
  givenname: Kristen
  surname: Grauman
  fullname: Grauman, Kristen
– sequence: 7
  givenname: Jiebo
  surname: Luo
  fullname: Luo, Jiebo
– sequence: 8
  givenname: Jeffrey P.
  surname: Bigham
  fullname: Bigham, Jeffrey P.
BookMark eNotzL1OwzAUQGGDQKKUzAwsfoEU_8T2NVtJoSBVoiAIY-Uk18Uodaq4FaJPDxJMZ_rOOTmJfURCLjmbcM7sdVktXyaCcZgwJoEdkcwa4EqC1oVg9piMONMy15bbM5Kl9MkYExokFGpEZlU4vIcDnQ8utrT8cF2HcY03dBrTFw4hrmkV0t519HmPaRf6mKgf-g297cIvWGK_7fCCnHrXJcz-OyZv93ev5UO-eJo_ltNFHrhRu1y16Ou61cppBVYBGI2sgUKKthGNccZzI7zHojbetprXrVKOaWi0c4XQUo7J1d83IOJqO4SNG75XoAwUBuQPQSZNEA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00380
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 3617
ExternalDocumentID 8578478
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i175t-5defbbd65a658958876e0c8432dc2c7a7f172ffe4b7f9d61bd55a068c6aa42633
IEDL.DBID RIE
IngestDate Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-5defbbd65a658958876e0c8432dc2c7a7f172ffe4b7f9d61bd55a068c6aa42633
PageCount 10
ParticipantIDs ieee_primary_8578478
PublicationCentury 2000
PublicationDate 2018-Jun
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-Jun
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.551442
Snippet The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial...
SourceID ieee
SourceType Publisher
StartPage 3608
SubjectTerms Blindness
Computer vision
Lighting
Mobile handsets
Prediction algorithms
Shape
Visualization
Title VizWiz Grand Challenge: Answering Visual Questions from Blind People
URI https://ieeexplore.ieee.org/document/8578478
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6AkydUMP5ODx4djK3tWm-KIjHBECPIjfTHW0I0w8gWE_56226gMR68bTssTdvtfd_r976H0AWnSqUSLMmRlAREEBnwnkvAa7D4JAVBPFEcPbLhhDzM6KyGLre1MADgxWfQcZf-LN8sdeFSZV1utxdJeB3VLXEra7W2-ZSI8ZhXJ2TuPrbMhgleufn0QtHtT8dPTsvlxJOxs4H80U7FR5NBE4024yhFJK-dIlcdvf5l0fjfge6i9nfdHh5vI9IeqkG2j5oV0MTVZ7xqodvpYv2yWON7G6gM7m_6qVzh62z16b0J8XSxKuQb9vlQtzOxq0PBNxaUGjz2qvM2mgzunvvDoGqnECwsRsgDaiBVyjAqLeoQ1P5dGISakzgyOtKJTFILZtIUiEpSYVhPGUplyLhmUjpb9_gANbJlBocI80SEkpEQQANxDCaGUCgW2TcmhHJzhFpuUubvpWPGvJqP478fn6AdtyylAOsUNfKPAs5sqM_VuV_jL1_Opzc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_DoYD_arvOmKKICIQaQG2nXt2TRDCNbTPjrbbeBxnjwtu2wNG2373uv3_seQhecShkJ0EGOoMQiAREWd0wCPgTNTyIISB4o9gesOyaPUzqtoMt1LQwA5OIzaJrL_CxfzcPMpMpaXG8v4vMNtKlxnzpFtdY6o-Iy7vHyjMzcezq2YQEv_XwcO2i1J8Nno-Yy8knPGEH-aKiS40mnhvqrkRQyktdmlspmuPxl0vjfoe6gxnflHh6uMWkXVSDZQ7WSauLyQ17U0e0kXr7ES3yvoUrh9qqjyhW-ThafuTshnsSLTLzhPCNq9iY2lSj4RtNShYe57ryBxp27UbtrlQ0VrFizhNSiCiIpFaNC846A6v8LAzvkxHNV6Ia-8CNNZ6IIiPSjQDFHKkqFzXjIhDDG7t4-qibzBA4Q5n5gC0ZsgBCIiWE8sAPJXP1Gn1CuDlHdTMrsvfDMmJXzcfT343O01R31e7Pew-DpGG2bJSrkWCeomn5kcKqBP5Vn-Xp_AUtjqoA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=VizWiz+Grand+Challenge%3A+Answering+Visual+Questions+from+Blind+People&rft.au=Gurari%2C+Danna&rft.au=Li%2C+Qing&rft.au=Stangl%2C+Abigale+J.&rft.au=Guo%2C+Anhong&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=3608&rft.epage=3617&rft_id=info:doi/10.1109%2FCVPR.2018.00380&rft.externalDocID=8578478