A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a da...
Saved in:
Published in | 2018 IEEE Symposium on VLSI Circuits pp. 35 - 36 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2018
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/VLSIC.2018.8502276 |
Cover
Summary: | A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp 16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14nm CMOS. |
---|---|
DOI: | 10.1109/VLSIC.2018.8502276 |