Research on knowledge distillation algorithm based on Yolov5 attention mechanism

The current most advanced CNN-based detection models are nearly not deployable on mobile devices with limited arithmetic power due to problems such as too many redundant parameters and excessive arithmetic power required, and knowledge distillation as a potentially practical model compression approa...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 240; p. 122553
Main Authors	ShengjieCheng, Zhou, Peiyong, YuLiu, HongjiMa, Aysa, Alimjan, Ubul, Kurban
Format	Journal Article
Language	English
Published	Elsevier Ltd 15.04.2024
Subjects	Deep learning Feature acquisition Knowledge distillation Migration learning Model compression Target detection Deep learning Feature acquisition Migration learning Knowledge distillation Model compression Target detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The current most advanced CNN-based detection models are nearly not deployable on mobile devices with limited arithmetic power due to problems such as too many redundant parameters and excessive arithmetic power required, and knowledge distillation as a potentially practical model compression approach can alleviate this limitation. In the past, feature-based knowledge distillation algorithms focused more on transferring the local features customized by people and reduced the full grasp of global information in images. To address the shortcomings of traditional feature distillation algorithms, we first improve GAMAttention to learn the global feature representation in images, and the improved attention mechanism can minimize the information loss caused by processing features. Secondly, feature shifting no longer defines manually which features should be shifted, a more interpretable approach is proposed where the student network learns to emulate the high-response feature regions predicted by the teacher network, which increases the end-to-end properties of the model, and feature shifting allows the student network to simulate the teacher network in generating semantically strong feature maps to improve the detection performance of the small model. To avoid learning too many noisy features when learning background features, these two parts of feature distillation are assigned different weights. Finally, logical distillation is performed on the prediction heads of the student and teacher networks. In this experiment, we chose Yolov5 as the base network structure for teacher–student pairs. We improved Yolov5s through attention and knowledge distillation, ultimately achieving a 1.3% performance gain on VOC and a 1.8% performance gain on KITTI. •We use knowledge distillation and attention for yolov5s to improve performance.•We propose the GAMF module that captures both global and local features.•We propose the RFT and join SSIM to shift response features and background features.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.122553