TT-MLP: Tensor Train Decomposition on Deep MLPs

Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing compone...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 11; p. 1
Main Authors Yan, Jiale, Ando, Kota, Yu, Jaehoon, Motomura, Masato
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing components. These architectures enable deep MLPs to have global receptive fields, but the significant increase of parameters becomes a massive burden on practical applications. To tackle this problem, we focus on using tensor-train decomposition (TTD) for compressing deep MLPs. At first, this paper analyzes deep MLPs under conventional TTD methods, especially using various designs of a macro framework and micro blocks: The former is how to concatenate mixer layers, and the latter is how to design a mixer layer. Based on the analysis, we propose a novel TTD method named Train-TTD-Train . The proposed method exerts the learning capability of channel-mixing components and improves the trade-off between accuracy and size. In the evaluation, the proposed method showed a better trade-off than conventional TTD methods on ImageNet-1K and achieved a 0.56% higher inference accuracy with a 15.44% memory reduction on Cifar-10.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3240784