Harmonizing knowledge Transfer in Neural Network with Unified Distillation
Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermed...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Knowledge distillation (KD), known for its ability to transfer knowledge from
a cumbersome network (teacher) to a lightweight one (student) without altering
the architecture, has been garnering increasing attention. Two primary
categories emerge within KD methods: feature-based, focusing on intermediate
layers' features, and logits-based, targeting the final layer's logits. This
paper introduces a novel perspective by leveraging diverse knowledge sources
within a unified KD framework. Specifically, we aggregate features from
intermediate layers into a comprehensive representation, effectively gathering
semantic information from different stages and scales. Subsequently, we predict
the distribution parameters from this representation. These steps transform
knowledge from the intermediate layers into corresponding distributive forms,
thereby allowing for knowledge distillation through a unified distribution
constraint at different stages of the network, ensuring the comprehensiveness
and coherence of knowledge transfer. Numerous experiments were conducted to
validate the effectiveness of the proposed method. |
---|---|
DOI: | 10.48550/arxiv.2409.18565 |