What Makes for a Good Saliency Map? Comparing Strategies for Evaluating Saliency Maps in Explainable AI (XAI)
Saliency maps are a popular approach for explaining classifications of (convolutional) neural networks. However, it remains an open question as to how best to evaluate salience maps, with three families of evaluation methods commonly being used: subjective user measures, objective user measures, and...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.04.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2504.17023 |
Cover
Loading…
Summary: | Saliency maps are a popular approach for explaining classifications of
(convolutional) neural networks. However, it remains an open question as to how
best to evaluate salience maps, with three families of evaluation methods
commonly being used: subjective user measures, objective user measures, and
mathematical metrics. We examine three of the most popular saliency map
approaches (viz., LIME, Grad-CAM, and Guided Backpropagation) in a between
subject study (N=166) across these families of evaluation methods. We test 1)
for subjective measures, if the maps differ with respect to user trust and
satisfaction; 2) for objective measures, if the maps increase users' abilities
and thus understanding of a model; 3) for mathematical metrics, which map
achieves the best ratings across metrics; and 4) whether the mathematical
metrics can be associated with objective user measures. To our knowledge, our
study is the first to compare several salience maps across all these evaluation
methods$-$with the finding that they do not agree in their assessment (i.e.,
there was no difference concerning trust and satisfaction, Grad-CAM improved
users' abilities best, and Guided Backpropagation had the most favorable
mathematical metrics). Additionally, we show that some mathematical metrics
were associated with user understanding, although this relationship was often
counterintuitive. We discuss these findings in light of general debates
concerning the complementary use of user studies and mathematical metrics in
the evaluation of explainable AI (XAI) approaches. |
---|---|
DOI: | 10.48550/arxiv.2504.17023 |