Interpretable Counterfactual Explanations Guided by Prototypes

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the search for counterfactua...

Full description

Saved in:

Bibliographic Details
Published in	Machine Learning and Knowledge Discovery in Databases. Research Track Vol. 12976; pp. 650 - 665
Main Authors	Van Looveren, Arnaud, Klaise, Janis
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Counterfactual explanations Interpretation Transparency/Explainability
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the search for counterfactual instances and result in more interpretable explanations. We quantitatively evaluate interpretability of the generated counterfactuals to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). Additionally, we propose a principled approach to handle categorical variables and illustrate our method on the Adult (Census) dataset. Our method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for black box models.
Bibliography:	Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-86520-7_40) contains supplementary material, which is available to authorized users.
ISBN:	3030865193 9783030865191
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-86520-7_40