NAP: Neural architecture search with pruning
Overview of proposed approach Neural Architecture search with Pruning (NAP). (a). We apply unnormalized relaxation on operation strengths. (b). During the searching process, the strengths of operations update inconsistently. We prune operations with weaker operation strengths. (c). The final archite...
Saved in:
Published in | Neurocomputing (Amsterdam) Vol. 477; pp. 85 - 95 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
07.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Overview of proposed approach Neural Architecture search with Pruning (NAP). (a). We apply unnormalized relaxation on operation strengths. (b). During the searching process, the strengths of operations update inconsistently. We prune operations with weaker operation strengths. (c). The final architecture we obtain. Only the operations with the largest strengths are preserved while others are pruned. Note that some input nodes might be discarded in the final architecture. [Display omitted]
•NAP alleviates the curse of skip-connect in prior DARTS-like methods.•The proposed pruning criterion increases the diversity of derived architecture.•Extensive experiments show the efficiency and effectiveness of our NAP.
There has been continuously increasing attention attracted by Neural Architecture Search (NAS). Due to its computational efficiency, gradient-based NAS methods like DARTS have become the most popular framework for NAS tasks. Nevertheless, as the search iterates, the derived model in previous NAS frameworks becomes dominated by skip-connects, causing the performance downfall. In this work, we present a novel approach to alleviate this issue, named Neural Architecture search with Pruning (NAP). Unlike prior differentiable architecture search works, our approach draws the idea from network pruning. We first train an over-parameterized network, including all candidate operations. Then we propose a criterion to prune the network. Based on a newly designed relaxation of architecture representation, NAP can derive the most potent model by removing trivial and redundant edges from the whole network topology. Experiments show the effectiveness of our proposed approach. Specifically, the model searched by NAP achieves state-of-the-art performances (2.48% test error) on CIFAR-10. We transfer the model to ImageNet and obtains a 25.1% test error with only 5.0 M parameters, which is on par with modern NAS methods. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2021.12.002 |