End-to-end model compression via pruning and knowledge distillation for lightweight image super resolution
Obtaining lightweight models is crucial for effectively addressing image super-resolution (SR), particularly for resource-constrained devices. Pruning and knowledge distillation (KD) are two commonly employed methods for compressing SR models. However, previous approaches have typically focused on a...
Saved in:
Published in | Pattern analysis and applications : PAA Vol. 28; no. 2 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
London
Springer London
01.06.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Obtaining lightweight models is crucial for effectively addressing image super-resolution (SR), particularly for resource-constrained devices. Pruning and knowledge distillation (KD) are two commonly employed methods for compressing SR models. However, previous approaches have typically focused on applying these methods either individually or sequentially, starting with pruning followed by KD. This sequential, pipeline approach tends to compartmentalize the techniques, failing to fully leverage the knowledge from the teacher model to guide the selection of redundant channels, often resulting in suboptimal pruning outcomes. To address these issues, we propose an end-to-end compression strategy named Pruning While Knowledge Distillation (PWKD) that successfully integrates pruning and KD into a single training process, significantly enhancing training efficiency. Furthermore, This novel integrated compression method allows the teacher model to provide guidance during the pruning process, instead of after pruning has been completed. By leveraging knowledge distillation from the teacher model to guide the pruning channel selection, it resolves the suboptimal channel selection issues commonly encountered in sequential training methods. This integration not only streamlines the training process for greater efficiency but also improves performance by ensuring more informed and effective pruning decisions. In addition, for pruning, we design an auto-pruning module that utilizes feature information to adaptively learn pruning masks, enabling automated pruning without the need for manually defined criteria. Moreover, we design a backward-differentiable gating function ensures the auto-pruning module remains differentiable during backpropagation. Furthermore, to address the challenges of KD, we introduce the Multiscale Wavelet Refine Module (MVRM). Designed to enhance the processing of image edges and intricate textures, MVRM significantly boosts the student model’s capability to accurately replicate the teacher model’s proficiency in restoring high-frequency information. Our integrated approach has been tested across multiple datasets and has consistently demonstrated significant performance improvements. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1433-7541 1433-755X |
DOI: | 10.1007/s10044-025-01450-9 |