22.1 A 12.4TOPS/W @ 136GOPS AI-IoT System-on-Chip with 16 RISC-V, 2-to-8b Precision-Scalable DNN Acceleration and 30%-Boost Adaptive Body Biasing

Emerging Artificial Intelligence-enabled Internet-of-Things (Al-loT) SoCs [1-4] for augmented reality, personalized healthcare and nano-robotics need to run a large variety of tasks within a power envelope of a few tens of mW: compute-intensive but bit-precision-tolerant Deep Neural Networks (DNNs),...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Solid- State Circuits Conference (ISSCC) pp. 21 - 23
Main Authors Conti, Francesco, Rossi, Davide, Paulin, Gianna, Garofalo, Anaelo, Di Mauro, Alfio, Rutishauer, Georg, Ottavi, Gian marco, Eggimann, Manuel, Okuhara, Hayate, Huard, Vincent, Montfort, Olivier, Jure, Lionel, Exibard, Nils, Gouedo, Pascal, Louvat, Mathieu, Botte, Emmanuel, Benini, Luca
Format Conference Proceeding
LanguageEnglish
Published IEEE 19.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Emerging Artificial Intelligence-enabled Internet-of-Things (Al-loT) SoCs [1-4] for augmented reality, personalized healthcare and nano-robotics need to run a large variety of tasks within a power envelope of a few tens of mW: compute-intensive but bit-precision-tolerant Deep Neural Networks (DNNs), as well as signal processing and control requiring high-precision floating-point. Performance and energy constraints vary greatly between different applications and even within different stages of the same application. We present Marsellus (Fig. 22.1.1), an all-digital Al-loT end-node heterogeneous \mathsf{SoC} fabricated in GlobalFoundries 22\mathsf{nm} FDX that combines three key contributions to enable aggressive scaling of performance and energy: 1) a generalpurpose cluster of 16 RISC-V DSP cores attuned for execution of a diverse range of workloads exploiting 4\mathsf{b} and 2\mathsf{b} arithmetic extensions (XpulpNN), combined with fused MAC \& LOAD (M&L) operations and floating-point support; 2) a 2-8b reconfigurable binary engine to accelerate 3\times 3 and 1\times 1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Bias (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.
ISSN:2376-8606
DOI:10.1109/ISSCC42615.2023.10067643