NASA + : Neural Architecture Search and Acceleration for Multiplication-Reduced Hybrid Networks

Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems. I, Regular papers Vol. 70; no. 6; pp. 1 - 14
Main Authors Shi, Huihong, You, Haoran, Wang, Zhongfeng, Lin, Yingyan
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://github.com/GATECH-EIC/NASA.
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2023.3256700