Concept Gradient: Concept-based Interpretation Without Linear Assumption
Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
31.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Concept-based interpretations of black-box models are often more intuitive
for humans to understand. The most widely adopted approach for concept-based
interpretation is Concept Activation Vector (CAV). CAV relies on learning a
linear relation between some latent representation of a given model and
concepts. The linear separability is usually implicitly assumed but does not
hold true in general. In this work, we started from the original intent of
concept-based interpretation and proposed Concept Gradient (CG), extending
concept-based interpretation beyond linear concept functions. We showed that
for a general (potentially non-linear) concept, we can mathematically evaluate
how a small change of concept affecting the model's prediction, which leads to
an extension of gradient-based interpretation to the concept space. We
demonstrated empirically that CG outperforms CAV in both toy examples and real
world datasets. |
---|---|
DOI: | 10.48550/arxiv.2208.14966 |