InfographicVQA

Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collecti...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE Workshop on Applications of Computer Vision pp. 2582 - 2591
Main Authors	Mathew, Minesh, Bagal, Viraj, Tito, Ruben, Karatzas, Dimosthenis, Valveny, Ernest, Jawahar, C. V.
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2022
Subjects	Benchmark testing Brain modeling Computational modeling Computer vision Data visualization Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages Layout Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics-images that are designed to communicate information quickly and clearly to human brain-is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa.org
ISSN:	2642-9381
DOI:	10.1109/WACV51458.2022.00264