Linear Log-Normal Attention with Unbiased Concentration
Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This limitation poses a substantial obstacle when dealing with long...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
22.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Transformer models have achieved remarkable results in a wide range of
applications. However, their scalability is hampered by the quadratic time and
memory complexity of the self-attention mechanism concerning the sequence
length. This limitation poses a substantial obstacle when dealing with long
documents or high-resolution images. In this work, we study the self-attention
mechanism by analyzing the distribution of the attention matrix and its
concentration ability. Furthermore, we propose instruments to measure these
quantities and introduce a novel self-attention mechanism, Linear Log-Normal
Attention, designed to emulate the distribution and concentration behavior of
the original self-attention. Our experimental results on popular natural
language benchmarks reveal that our proposed Linear Log-Normal Attention
outperforms other linearized attention alternatives, offering a promising
avenue for enhancing the scalability of transformer models. |
---|---|
DOI: | 10.48550/arxiv.2311.13541 |