Artificial Intelligence Models Do Not Ground Negation, Humans Do. GuessWhat?! Dialogues as a Case Study

Negation is widely present in human communication, yet it is largely neglected in the research on conversational agents based on neural network architectures. Cognitive studies show that a supportive visual context makes the processing of negation easier. We take GuessWhat?!, a referential visually...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in big data Vol. 4; p. 736709
Main Authors	Testoni, Alberto, Greco, Claudio, Bernardi, Raffaella
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 24.01.2022
Subjects	analysis Big Data multimodal encoders multimodal models negation transformers visual dialogue visual dialogue negation transformers analysis multimodal models multimodal encoders
Online Access	Get full text
ISSN	2624-909X 2624-909X
DOI	10.3389/fdata.2021.736709

Cover

More Information
Summary:	Negation is widely present in human communication, yet it is largely neglected in the research on conversational agents based on neural network architectures. Cognitive studies show that a supportive visual context makes the processing of negation easier. We take GuessWhat?!, a referential visually grounded guessing game, as test-bed and evaluate to which extent guessers based on pre-trained language models profit from negatively answered polar questions. Moreover, to get a better grasp of models' results, we select a controlled sample of games and run a crowdsourcing experiment with subjects. We evaluate models and humans against the same settings and use the comparison to better interpret the models' results. We show that while humans profit from negatively answered questions to solve the task, models struggle in grounding negation, and some of them barely use it; however, when the language signal is poorly informative, visual features help encoding the negative information. Finally, the experiments with human subjects put us in the position of comparing humans and models' predictions and get a grasp about which models make errors that are more human-like and as such more plausible.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Balaraman Ravindran, Indian Institute of Technology Madras, India This article was submitted to Machine Learning and Artificial Intelligence, a section of the journal Frontiers in Big Data These authors have contributed equally to this work Reviewed by: Ujwal Gadiraju, Delft University of Technology, Netherlands; Parisa Kordjamshidi, Michigan State University, United States
ISSN:	2624-909X 2624-909X
DOI:	10.3389/fdata.2021.736709