The Irrationality of Neural Rationale Models

Neural rationale models are popular for interpretable predictions of NLP tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plaus...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zheng, Yiming, Booth, Serena, Shah, Julie, Zhou, Yilun
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 24.07.2022
Subjects	Classifiers Empirical analysis Irrationality Segments
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Neural rationale models are popular for interpretable predictions of NLP tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined as the explanation. Is such a characterization unconditionally correct? In this paper, we argue to the contrary, with both philosophical perspectives and empirical evidence suggesting that rationale models are, perhaps, less rational and interpretable than expected. We call for more rigorous and comprehensive evaluations of these models to ensure desired properties of interpretability are indeed achieved. The code can be found at https://github.com/yimingz89/Neural-Rationale-Analysis.
ISSN:	2331-8422