Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification
Compared to traditional learning from scratch, knowledge distillation sometimes makes the DNN achieve superior performance. This paper provides a new perspective to explain the success of knowledge distillation, i.e., quantifying knowledge points encoded in intermediate layers of a DNN for classific...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Compared to traditional learning from scratch, knowledge distillation
sometimes makes the DNN achieve superior performance. This paper provides a new
perspective to explain the success of knowledge distillation, i.e., quantifying
knowledge points encoded in intermediate layers of a DNN for classification,
based on the information theory. To this end, we consider the signal processing
in a DNN as the layer-wise information discarding. A knowledge point is
referred to as an input unit, whose information is much less discarded than
other input units. Thus, we propose three hypotheses for knowledge distillation
based on the quantification of knowledge points. 1. The DNN learning from
knowledge distillation encodes more knowledge points than the DNN learning from
scratch. 2. Knowledge distillation makes the DNN more likely to learn different
knowledge points simultaneously. In comparison, the DNN learning from scratch
tends to encode various knowledge points sequentially. 3. The DNN learning from
knowledge distillation is often optimized more stably than the DNN learning
from scratch. In order to verify the above hypotheses, we design three types of
metrics with annotations of foreground objects to analyze feature
representations of the DNN, \textit{i.e.} the quantity and the quality of
knowledge points, the learning speed of different knowledge points, and the
stability of optimization directions. In experiments, we diagnosed various DNNs
for different classification tasks, i.e., image classification, 3D point cloud
classification, binary sentiment classification, and question answering, which
verified above hypotheses. |
---|---|
DOI: | 10.48550/arxiv.2208.08741 |