End-to-end model compression via pruning and knowledge distillation for lightweight image super resolution

Obtaining lightweight models is crucial for effectively addressing image super-resolution (SR), particularly for resource-constrained devices. Pruning and knowledge distillation (KD) are two commonly employed methods for compressing SR models. However, previous approaches have typically focused on a...

Full description

Saved in:

Bibliographic Details
Published in	Pattern analysis and applications : PAA Vol. 28; no. 2
Main Authors	Wang, Yanzhe, Wang, Yizhen, Rohra, Avinash, Yin, Baoqun
Format	Journal Article
Language	English
Published	London Springer London 01.06.2025 Springer Nature B.V
Subjects	Back propagation Computer Science Image compression Image resolution Integrated approach Modules Original Article Pattern Recognition Performance enhancement Pruning Teachers Pruning Knowledge distillation Model compression Image super-resolution
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Obtaining lightweight models is crucial for effectively addressing image super-resolution (SR), particularly for resource-constrained devices. Pruning and knowledge distillation (KD) are two commonly employed methods for compressing SR models. However, previous approaches have typically focused on applying these methods either individually or sequentially, starting with pruning followed by KD. This sequential, pipeline approach tends to compartmentalize the techniques, failing to fully leverage the knowledge from the teacher model to guide the selection of redundant channels, often resulting in suboptimal pruning outcomes. To address these issues, we propose an end-to-end compression strategy named Pruning While Knowledge Distillation (PWKD) that successfully integrates pruning and KD into a single training process, significantly enhancing training efficiency. Furthermore, This novel integrated compression method allows the teacher model to provide guidance during the pruning process, instead of after pruning has been completed. By leveraging knowledge distillation from the teacher model to guide the pruning channel selection, it resolves the suboptimal channel selection issues commonly encountered in sequential training methods. This integration not only streamlines the training process for greater efficiency but also improves performance by ensuring more informed and effective pruning decisions. In addition, for pruning, we design an auto-pruning module that utilizes feature information to adaptively learn pruning masks, enabling automated pruning without the need for manually defined criteria. Moreover, we design a backward-differentiable gating function ensures the auto-pruning module remains differentiable during backpropagation. Furthermore, to address the challenges of KD, we introduce the Multiscale Wavelet Refine Module (MVRM). Designed to enhance the processing of image edges and intricate textures, MVRM significantly boosts the student model’s capability to accurately replicate the teacher model’s proficiency in restoring high-frequency information. Our integrated approach has been tested across multiple datasets and has consistently demonstrated significant performance improvements.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-025-01450-9