Pianissimo: A Sub-mW Class DNN Accelerator With Progressively Adjustable Bit-Precision

With the widespread adoption of edge AI, the diversity of application requirements and fluctuating computational demands present significant challenges. Conventional accelerators suffer from increased memory footprints due to the need for multiple models to adapt to these varied requirements over ti...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 12; pp. 2057 - 2073
Main Authors Suzuki, Junnosuke, Yu, Jaehoon, Yasunaga, Mari, Garcia-Arias, Angel Lopez, Okoshi, Yasuyuki, Kumazawa, Shungo, Ando, Kota, Kawamura, Kazushi, Van Chu, Thiem, Motomura, Masato
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the widespread adoption of edge AI, the diversity of application requirements and fluctuating computational demands present significant challenges. Conventional accelerators suffer from increased memory footprints due to the need for multiple models to adapt to these varied requirements over time. In such dynamic edge conditions, it is crucial to accommodate these changing computational needs within strict memory and power constraints while maintaining the flexibility to support a wide range of applications. In response to these challenges, this article proposes a sub-mW class inference accelerator called Pianissimo that achieves competitive power efficiency while flexibly adapting to changing edge environment conditions at the architecture level. The heart of the design concept is a novel datapath architecture with a progressive bit-by-bit datapath. This unique datapath is augmented by software-hardware (SW-HW) cooperative control with a reduced instruction set computer processor and HW counters. The integrated SW-HW control enables adaptive inference schemes of adaptive/mixed precision and Block Skip, optimizing the balance between computational efficiency and accuracy. The 40 nm chip, with 1104 KB memory, dissipates 793-<inline-formula> <tex-math notation="LaTeX">1032~\mu \text{W} </tex-math></inline-formula> at 0.7 V on MobileNetV1, achieving 0.49-1.25 TOPS/W at this ultra-low power range.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3347578