Clarity in complexity: how aggregating explanations resolves the disagreement problem
The Rashômon Effect, applied in Explainable Machine Learning, refers to the disagreement between the explanations provided by various attribution explainers and to the dissimilarity across multiple explanations generated by a particular explainer for a single instance from the dataset (differences b...
Saved in:
Published in | The Artificial intelligence review Vol. 57; no. 12; p. 338 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer Netherlands
19.10.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The Rashômon Effect, applied in Explainable Machine Learning, refers to the disagreement between the explanations provided by various attribution explainers and to the dissimilarity across multiple explanations generated by a particular explainer for a single instance from the dataset (differences between feature importances and their associated signs and ranks), an undesirable outcome especially in sensitive domains such as healthcare or finance. We propose a method inspired from textual-case based reasoning for aligning explanations from various explainers in order to resolve the disagreement and dissimilarity problems. We iteratively generated a number of 100 explanations for each instance from six popular datasets, using three prevalent feature attribution explainers: LIME, Anchors and SHAP (with the variations Tree SHAP and Kernel SHAP) and consequently applied a global cluster-based aggregation strategy that quantifies alignment and reveals similarities and associations between explanations. We evaluated our method by weighting the
-NN algorithm with agreed feature overlap explanation weights and compared it to a non-weighted
-NN predictor, having as task binary classification. Also, we compared the results of the weighted
-NN algorithm using aggregated feature overlap explanation weights to the weighted
-NN algorithm using weights produced by a single explanation method (either LIME, SHAP or Anchors). Our global alignment method benefited the most from a hybridization with feature importance scores (information gain), that was essential for acquiring a more accurate estimate of disagreement, for enabling explainers to reach a consensus across multiple explanations and for supporting effective model learning through improved classification performance. |
---|---|
ISSN: | 1573-7462 0269-2821 1573-7462 |
DOI: | 10.1007/s10462-024-10952-7 |