KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations
The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and alleviating...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The 61st Annual Meeting of the Association for Computational
Linguistics (ACL 2023) While recent works have been considerably improving the quality of the
natural language explanations (NLEs) generated by a model to justify its
predictions, there is very limited research in detecting and alleviating
inconsistencies among generated NLEs. In this work, we leverage external
knowledge bases to significantly improve on an existing adversarial attack for
detecting inconsistent NLEs. We apply our attack to high-performing NLE models
and show that models with higher NLE quality do not necessarily generate fewer
inconsistencies. Moreover, we propose an off-the-shelf mitigation method to
alleviate inconsistencies by grounding the model into external background
knowledge. Our method decreases the inconsistencies of previous high-performing
NLE models as detected by our attack. |
---|---|
DOI: | 10.48550/arxiv.2306.02980 |