Natural Questions: A Benchmark for Question Answering Research
We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typic...
Saved in:
Published in | Transactions of the Association for Computational Linguistics Vol. 7; pp. 453 - 466 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.11.2019
MIT Press Journals, The The MIT Press |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present the Natural Questions corpus, a question answering data set. Questions
consist of real anonymized, aggregated queries issued to the Google search
engine. An annotator is presented with a question along with a Wikipedia page
from the top 5 search results, and annotates a long answer (typically a
paragraph) and a short answer (one or more entities) if present on the page, or
marks null if no long/short answer is present. The public release consists of
307,373 training examples with single annotations; 7,830 examples with 5-way
annotations for development data; and a further 7,842 examples with 5-way
annotated sequestered as test data. We present experiments validating quality of
the data. We also describe analysis of 25-way annotations on 302 examples,
giving insights into human variability on the annotation task. We introduce
robust metrics for the purposes of evaluating question answering systems;
demonstrate high human upper bounds on these metrics; and establish baseline
results using competitive methods drawn from related literature. |
---|---|
Bibliography: | Volume, 2019 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00276 |