A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense

ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One...

Full description

Saved in:

Bibliographic Details
Published in	Data intelligence Vol. 6; no. 1; pp. 1 - 28
Main Authors	Santos, Henrique, Shen, Ke, Mulvehill, Alice M., Kejriwal, Mayank, McGuinness, Deborah L.
Format	Journal Article
Language	English
Published	Cambridge MIT Press Journals, The 01.12.2024
Subjects	Artificial intelligence Benchmarks Datasets Questions Reasoning Semantics Statistical models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.
ISSN:	2641-435X 2641-435X
DOI:	10.1162/dint_a_00234