NAP: Neural architecture search with pruning

Overview of proposed approach Neural Architecture search with Pruning (NAP). (a). We apply unnormalized relaxation on operation strengths. (b). During the searching process, the strengths of operations update inconsistently. We prune operations with weaker operation strengths. (c). The final archite...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 477; pp. 85 - 95
Main Authors Ding, Yadong, Wu, Yu, Huang, Chengyue, Tang, Siliang, Wu, Fei, Yang, Yi, Zhu, Wenwu, Zhuang, Yueting
Format Journal Article
LanguageEnglish
Published Elsevier B.V 07.03.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Overview of proposed approach Neural Architecture search with Pruning (NAP). (a). We apply unnormalized relaxation on operation strengths. (b). During the searching process, the strengths of operations update inconsistently. We prune operations with weaker operation strengths. (c). The final architecture we obtain. Only the operations with the largest strengths are preserved while others are pruned. Note that some input nodes might be discarded in the final architecture. [Display omitted] •NAP alleviates the curse of skip-connect in prior DARTS-like methods.•The proposed pruning criterion increases the diversity of derived architecture.•Extensive experiments show the efficiency and effectiveness of our NAP. There has been continuously increasing attention attracted by Neural Architecture Search (NAS). Due to its computational efficiency, gradient-based NAS methods like DARTS have become the most popular framework for NAS tasks. Nevertheless, as the search iterates, the derived model in previous NAS frameworks becomes dominated by skip-connects, causing the performance downfall. In this work, we present a novel approach to alleviate this issue, named Neural Architecture search with Pruning (NAP). Unlike prior differentiable architecture search works, our approach draws the idea from network pruning. We first train an over-parameterized network, including all candidate operations. Then we propose a criterion to prune the network. Based on a newly designed relaxation of architecture representation, NAP can derive the most potent model by removing trivial and redundant edges from the whole network topology. Experiments show the effectiveness of our proposed approach. Specifically, the model searched by NAP achieves state-of-the-art performances (2.48% test error) on CIFAR-10. We transfer the model to ImageNet and obtains a 25.1% test error with only 5.0 M parameters, which is on par with modern NAS methods.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2021.12.002