Harmonizing knowledge Transfer in Neural Network with Unified Distillation

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermed...

Full description

Saved in:

Bibliographic Details
Main Authors	Huang, Yaomin, Yan, Zaomin, Shen, Chaomin, Fang, Faming, Zhang, Guixu
Format	Journal Article
Language	English
Published	27.09.2024
Subjects	Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a unified distribution constraint at different stages of the network, ensuring the comprehensiveness and coherence of knowledge transfer. Numerous experiments were conducted to validate the effectiveness of the proposed method.
DOI:	10.48550/arxiv.2409.18565