Relational Knowledge Distillation

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. W...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3962 - 3971
Main Authors Park, Wonpyo, Kim, Dongju, Lu, Yan, Cho, Minsu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.
ISSN:2575-7075
DOI:10.1109/CVPR.2019.00409