7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performin...

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 130 - 132
Main Authors	Song, Jinook, Cho, Yunkyo, Park, Jun-Seok, Jang, Jun-Woo, Lee, Sehwan, Song, Joon-Ho, Lee, Jae-Gon, Kang, Inyup
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2019
Subjects	Bandwidth Central Processing Unit Clocks Kernel Neural networks Parallel processing Semiconductor device measurement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
ISSN:	2376-8606
DOI:	10.1109/ISSCC.2019.8662476