A 28nm 1.07TFLOPS/mm2 Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing

Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 -5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust trai...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE Custom Integrated Circuits Conference (CICC) pp. 1 - 2
Main Authors	Yang, Yixiong, Liu, Ruoyang, Wei, Chenhan, Wang, Wenxun, Sun, Wenyu, Yue, Jinshan, Yang, Huazhong, Liu, Yongpan
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2023
Subjects	Application specific integrated circuits Artificial intelligence Deep learning Manuals Program processors Servers Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 -5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust training results, or a low-precision format to increase efficiency but fail in accuracy. Mixed precision training (MPT) is promising to achieve both high accuracy and high efficiency. Manual mixed precision [5] is usually a coarse-grained mapping (per layer), which limits training accuracy. Automatic precision search [6] provides accurate and fine-grained precision mapping, but the high search latency slowdown the overall training process.
ISSN:	2152-3630
DOI:	10.1109/CICC57935.2023.10121210