TT-MLP: Tensor Train Decomposition on Deep MLPs

Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing compone...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 11; p. 1
Main Authors	Yan, Jiale, Ando, Kota, Yu, Jaehoon, Motomura, Masato
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computer architecture Computer vision Decomposition Deep learning deep multilayer perceptron deep neural networks low-rank approximation Mathematical analysis Mixers Multilayer perceptrons network parameter compression Neural networks Parameter estimation Tensor-train decomposition Tensors Tradeoffs Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing components. These architectures enable deep MLPs to have global receptive fields, but the significant increase of parameters becomes a massive burden on practical applications. To tackle this problem, we focus on using tensor-train decomposition (TTD) for compressing deep MLPs. At first, this paper analyzes deep MLPs under conventional TTD methods, especially using various designs of a macro framework and micro blocks: The former is how to concatenate mixer layers, and the latter is how to design a mixer layer. Based on the analysis, we propose a novel TTD method named Train-TTD-Train . The proposed method exerts the learning capability of channel-mixing components and improves the trade-off between accuracy and size. In the evaluation, the proposed method showed a better trade-off than conventional TTD methods on ImageNet-1K and achieved a 0.56% higher inference accuracy with a 15.44% memory reduction on Cifar-10.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3240784