TT-MLP: Tensor Train Decomposition on Deep MLPs
Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing compone...
Saved in:
Published in | IEEE access Vol. 11; p. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing components. These architectures enable deep MLPs to have global receptive fields, but the significant increase of parameters becomes a massive burden on practical applications. To tackle this problem, we focus on using tensor-train decomposition (TTD) for compressing deep MLPs. At first, this paper analyzes deep MLPs under conventional TTD methods, especially using various designs of a macro framework and micro blocks: The former is how to concatenate mixer layers, and the latter is how to design a mixer layer. Based on the analysis, we propose a novel TTD method named Train-TTD-Train . The proposed method exerts the learning capability of channel-mixing components and improves the trade-off between accuracy and size. In the evaluation, the proposed method showed a better trade-off than conventional TTD methods on ImageNet-1K and achieved a 0.56% higher inference accuracy with a 15.44% memory reduction on Cifar-10. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3240784 |