Permutation Equivariance of Transformers and Its Applications

Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Xu, Hengyuan, Liyao Xiang, Ye, Hangyu, Yao, Dixi, Chu, Pengzhi, Li, Baochun
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 31.03.2024
Subjects	Coders Equivalence Learning Neural networks Permutations Privacy Training Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.
ISSN:	2331-8422