Improving Network Interpretability via Explanation Consistency Evaluation
While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance,...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
08.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | While deep neural networks have achieved remarkable performance, they tend to
lack transparency in prediction. The pursuit of greater interpretability in
neural networks often results in a degradation of their original performance.
Some works strive to improve both interpretability and performance, but they
primarily depend on meticulously imposed conditions. In this paper, we propose
a simple yet effective framework that acquires more explainable activation
heatmaps and simultaneously increase the model performance, without the need
for any extra supervision. Specifically, our concise framework introduces a new
metric, i.e., explanation consistency, to reweight the training samples
adaptively in model learning. The explanation consistency metric is utilized to
measure the similarity between the model's visual explanations of the original
samples and those of semantic-preserved adversarial samples, whose background
regions are perturbed by using image adversarial attack techniques. Our
framework then promotes the model learning by paying closer attention to those
training samples with a high difference in explanations (i.e., low explanation
consistency), for which the current model cannot provide robust
interpretations. Comprehensive experimental results on various benchmarks
demonstrate the superiority of our framework in multiple aspects, including
higher recognition accuracy, greater data debiasing capability, stronger
network robustness, and more precise localization ability on both regular
networks and interpretable networks. We also provide extensive ablation studies
and qualitative analyses to unveil the detailed contribution of each component. |
---|---|
DOI: | 10.48550/arxiv.2408.04600 |