A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a da...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE Symposium on VLSI Circuits pp. 35 - 36
Main Authors Fleischer, Bruce, Shukla, Sunil, Ziegler, Matthew, Silberman, Joel, Jinwook Oh, Srinivasan, Vijavalakshmi, Jungwook Choi, Mueller, Silvia, Agrawal, Ankur, Babinsky, Tina, Nianzheng Cao, Chia-Yu Chen, Chuang, Pierce, Fox, Thomas, Gristede, George, Guillorn, Michael, Haynie, Howard, Klaiber, Michael, Dongsoo Lee, Shih-Hsien Lo, Maier, Gary, Scheuermann, Michael, Venkataramani, Swagath, Vezyrtzis, Christos, Naigang Wang, Fanchieh Yee, Ching Zhou, Pong-Fei Lu, Curran, Brian, Lel Chang, Gopalakrishnan, Kailash
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text
DOI10.1109/VLSIC.2018.8502276

Cover

More Information
Summary:A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp 16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14nm CMOS.
DOI:10.1109/VLSIC.2018.8502276