Interpretable Counterfactual Explanations Guided by Prototypes
We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the search for counterfactua...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases. Research Track Vol. 12976; pp. 650 - 665 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2021
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the search for counterfactual instances and result in more interpretable explanations. We quantitatively evaluate interpretability of the generated counterfactuals to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). Additionally, we propose a principled approach to handle categorical variables and illustrate our method on the Adult (Census) dataset. Our method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for black box models. |
---|---|
Bibliography: | Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-86520-7_40) contains supplementary material, which is available to authorized users. |
ISBN: | 3030865193 9783030865191 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-86520-7_40 |