A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

We propose a sub-<inline-formula> <tex-math notation="LaTeX">\mu \text{W} </tex-math></inline-formula> always-ON keyword spotting (<inline-formula> <tex-math notation="LaTeX">\mu </tex-math></inline-formula>KWS) chip for audio wake-...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of solid-state circuits Vol. 56; no. 1; pp. 151 - 164
Main Authors	Shan, Weiwei, Yang, Minhao, Wang, Tao, Lu, Yicheng, Cai, Hao, Zhu, Lixuan, Xu, Jiaming, Wu, Chengjun, Shi, Longxing, Yang, Jun
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Binary neural network (NN) Circuit design CMOS Computation Convolution data reuse depthwise separable convolution (DSC) Feature extraction Hardware Iron keyword spotting (KWS) Leakage Mel frequency cepstral coefficient near-threshold voltage (NTV) design Neural networks Power consumption serial fast Fourier transform (FFT) Task analysis Threshold voltage Word processing Words (language)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a sub-<inline-formula> <tex-math notation="LaTeX">\mu \text{W} </tex-math></inline-formula> always-ON keyword spotting (<inline-formula> <tex-math notation="LaTeX">\mu </tex-math></inline-formula>KWS) chip for audio wake-up systems. It is mainly composed of a neural network (NN) and a feature extraction (FE) circuit. For significantly reducing the memory footprint and computational load, four techniques are used to achieve ultra-low-power consumption: 1) a serial-FFT-based Mel-frequency cepstrum coefficient circuit is designed for FE, instead of the common parallel FFT. 2) A small-sized binarized depthwise separable convolutional NN (DSCNN) is designed as the classifier. 3) A framewise incremental computation technique is devised in contrast to the conventional whole-word processing. 4) Reduced computation allows a low system clock frequency, which enables near-threshold voltage operation, and low leakage memory blocks are designed to minimize the leakage power. Implemented in 28-nm CMOS technology, this <inline-formula> <tex-math notation="LaTeX">\mu </tex-math></inline-formula>KWS consumes <inline-formula> <tex-math notation="LaTeX">0.51~\mu \text{W} </tex-math></inline-formula> at a 40-kHz frequency and a 0.41-V supply, with an area of 0.23 mm 2 . Using the Google speech command data set, 97.3% accuracy is reached for a one-word KWS task and 94.6% for a two-word task.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2020.3029097