TetriX: Flexible Architecture and Optimal Mapping for Tensorized Neural Network Processing
The continuous growth of deep neural network model size and complexity hinders the adoption of large models in resource-constrained platforms. Tensor decomposition has been shown effective in reducing the model size by large compression ratios, but the resulting tensorized neural networks (TNNs) req...
Saved in:
Published in | IEEE transactions on computers Vol. 73; no. 5; pp. 1219 - 1232 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.05.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The continuous growth of deep neural network model size and complexity hinders the adoption of large models in resource-constrained platforms. Tensor decomposition has been shown effective in reducing the model size by large compression ratios, but the resulting tensorized neural networks (TNNs) require complex and versatile tensor shaping for tensor contraction, causing a low processing efficiency for existing hardware architectures. This work presents TetriX, a co-design of flexible architecture and optimal workload mapping for efficient and flexible TNN processing. TetriX adopts a unified processing architecture to support both inner and outer product. A hybrid mapping scheme is proposed to eliminate complex tensor shaping by alternating between inner and outer product in a sequence of tensor contractions. Finally, a mapping-aware contraction sequence search (MCSS) is proposed to identify the contraction sequence and workload mapping for achieving the optimal latency on TetriX. Remarkably, combining TetriX with MCSS outperforms the single-mode inner-product and outer-product baselines by up to 46.8<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="zhang-ieq1-3365936.gif"/> </inline-formula> in performance across the collected TNN workloads. TetriX is the first work to support all existing tensor decomposition methods. Compared to a TNN accelerator designed for the hierarchical Tucker method, TetriX achieves improvements of 6.5<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="zhang-ieq2-3365936.gif"/> </inline-formula> and 1.1<inline-formula><tex-math notation="LaTeX">\boldsymbol{\times}</tex-math> <mml:math><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="zhang-ieq3-3365936.gif"/> </inline-formula> in inference throughput and efficiency, respectively. |
---|---|
ISSN: | 0018-9340 1557-9956 |
DOI: | 10.1109/TC.2024.3365936 |