Transferring Inter-Class Correlation for Teacher–Student frameworks with flexible models
The Teacher–Student (T–S) framework is widely utilized in classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). As the transferring knowledge is related to the network capac...
Saved in:
Published in | Knowledge-based systems Vol. 242; p. 108316 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Amsterdam
Elsevier B.V
22.04.2022
Elsevier Science Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The Teacher–Student (T–S) framework is widely utilized in classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). As the transferring knowledge is related to the network capacities and structures between the teacher and the student, how to define knowledge effectively remains an open question. To address this issue, we design a novel and flexible transferring knowledge, Self-Attention based Inter-Class Correlation (ICC) map, which reveals the correlation between every two classes in a mini-batch. Based on the ICC map, we propose a T–S framework, Inter-Class Correlation Transfer (ICCT), in which the knowledge from the teacher with a higher, equal, or lower capacity than the student can bring the benefit to the training process of the student. The ICCT can be applied flexibly on the heterogeneous network structures of the T–S pairs and exhibits excellent compatibility with existing frameworks with hidden-layers knowledge. Notably, the analysis of the ICCT demonstrates that students comprehensively learn the teacher’s knowledge in conjunction with their own understanding, rather than mimicking the teacher’s knowledge entirely. Extensive experiments are conducted in CIFAR-10, CIFAR-100, and ILSVRC2012 image classification datasets in different T–S application scenarios with different network structures. The results demonstrate that the ICCT can improve the student’s performance and outperform other state-of-the-art T–S frameworks. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2022.108316 |