A 28nm 1.07TFLOPS/mm2 Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing

Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 -5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust trai...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE Custom Integrated Circuits Conference (CICC) pp. 1 - 2
Main Authors Yang, Yixiong, Liu, Ruoyang, Wei, Chenhan, Wang, Wenxun, Sun, Wenyu, Yue, Jinshan, Yang, Huazhong, Liu, Yongpan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 -5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust training results, or a low-precision format to increase efficiency but fail in accuracy. Mixed precision training (MPT) is promising to achieve both high accuracy and high efficiency. Manual mixed precision [5] is usually a coarse-grained mapping (per layer), which limits training accuracy. Automatic precision search [6] provides accurate and fine-grained precision mapping, but the high search latency slowdown the overall training process.
ISSN:2152-3630
DOI:10.1109/CICC57935.2023.10121210