Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

A key limitation in current datasets for is that the required steps for answering the question are mentioned in it . In this work, we introduce S QA, a question answering (QA) benchmark where the required reasoning steps are in the question, and should be inferred using a . A fundamental challenge i...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 9; pp. 346 - 361
Main Authors Geva, Mor, Khashabi, Daniel, Segal, Elad, Khot, Tushar, Roth, Dan, Berant, Jonathan
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2021
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A key limitation in current datasets for is that the required steps for answering the question are mentioned in it . In this work, we introduce S QA, a question answering (QA) benchmark where the required reasoning steps are in the question, and should be inferred using a . A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, S QA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in S QA are short, topic-diverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87%) on this task, while our best baseline reaches an accuracy of ∼ 66 .
Bibliography:2021
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00370