11.3 Metis AIPU: A 12nm 15TOPS/W 209.6TOPS SoC for Cost- and Energy-Efficient Inference at the Edge

The Metis AI Processing Unit (AIPU) is a quad-core System-on-Chip (SoC) designed for edge inference, executing all components of an AI workload on-chip. The Metis AIPU exhibits performance of 52.4 TOPS per AI core, and a compound throughput of 209.6 TOPS. Key features of the Metis AIPU and its integ...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE International Solid-State Circuits Conference (ISSCC) Vol. 67; pp. 212 - 214
Main Authors Hager, Pascal Alexander, Moons, Bert, Cosemans, Stefan, Papistas, Ioannis A., Rooseleer, Bram, Loon, Jeroen Van, Uytterhoeven, Roel, Zaruba, Florian, Koumousi, Spyridoula, Stanisavljevic, Milos, Mach, Stefan, Mutsaards, Sebastiaan, Aljameh, Riduan Khaddam, Khov, Gua Hao, Machiels, Brecht, Olar, Cristian, Psarras, Anastasios, Geursen, Sander, Vermeeren, Jeroen, Lu, Yi, Maringanti, Abhishek, Ameta, Deepak, Katselas, Leonidas, Hutter, Noah, Schmuck, Manuel, Sivadas, Swetha, Sharma, Karishma, Oliveira, Manuel, Aerne, Ramon, Sharma, Nitish, Soni, Timir, Bussolino, Beatrice, Pesut, Djordje, Pallaro, Michele, Podlesnii, Andrei, Lyrakis, Alexios, Ruffiner, Yannick, Dazzi, Martino, Thiele, Johannes, Goetschalckx, Koen, Bruschi, Nazareno, Doevenspeck, Jonas, Verhoef, Bram, Linz, Stefan, Garcea, Giuseppe, Ferguson, Jonathan, Koltsidas, Ioannis, Eleftheriou, Evangelos
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The Metis AI Processing Unit (AIPU) is a quad-core System-on-Chip (SoC) designed for edge inference, executing all components of an AI workload on-chip. The Metis AIPU exhibits performance of 52.4 TOPS per AI core, and a compound throughput of 209.6 TOPS. Key features of the Metis AIPU and its integration into a PCIe card-based system are shown in Fig. 11.3.1. Metis leverages the benefits from a quantized digital in-memory computing (D-IMC) architecture - with 8b weights, 8b activations, and full-precision accumulation - to decrease both the memory cost of weights and activations and the energy consumption of matrix-vector multiplications (MVM), without compromising the neural network accuracy.
ISSN:2376-8606
DOI:10.1109/ISSCC49657.2024.10454395