Teaching What You Should Teach: A Data-Based Distillation Method
In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Shoul...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.12.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In real teaching scenarios, an excellent teacher always teaches what he (or
she) is good at but the student is not. This gives the student the best
assistance in making up for his (or her) weaknesses and becoming a good one
overall. Enlightened by this, we introduce the "Teaching what you Should Teach"
strategy into a knowledge distillation framework, and propose a data-based
distillation method named "TST" that searches for desirable augmented samples
to assist in distilling more efficiently and rationally. To be specific, we
design a neural network-based data augmentation module with priori bias, which
assists in finding what meets the teacher's strengths but the student's
weaknesses, by learning magnitudes and probabilities to generate suitable data
samples. By training the data augmentation module and the generalized
distillation paradigm in turn, a student model is learned with excellent
generalization ability. To verify the effectiveness of our method, we conducted
extensive comparative experiments on object recognition, detection, and
segmentation tasks. The results on the CIFAR-10, ImageNet-1k, MS-COCO, and
Cityscapes datasets demonstrate that our method achieves state-of-the-art
performance on almost all teacher-student pairs. Furthermore, we conduct
visualization studies to explore what magnitudes and probabilities are needed
for the distillation process. |
---|---|
DOI: | 10.48550/arxiv.2212.05422 |