Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), Dublin, Ireland, May 2022 Recent studies have determined that the learned token embeddings of large-scale neural language models are degenerated to be anisotropic with a narrow-cone shape. This phenomenon,...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics (ACL), Dublin, Ireland, May 2022 Recent studies have determined that the learned token embeddings of
large-scale neural language models are degenerated to be anisotropic with a
narrow-cone shape. This phenomenon, called the representation degeneration
problem, facilitates an increase in the overall similarity between token
embeddings that negatively affect the performance of the models. Although the
existing methods that address the degeneration problem based on observations of
the phenomenon triggered by the problem improves the performance of the text
generation, the training dynamics of token embeddings behind the degeneration
problem are still not explored. In this study, we analyze the training dynamics
of the token embeddings focusing on rare token embedding. We demonstrate that
the specific part of the gradient for rare token embeddings is the key cause of
the degeneration problem for all tokens during training stage. Based on the
analysis, we propose a novel method called, adaptive gradient gating (AGG). AGG
addresses the degeneration problem by gating the specific part of the gradient
for rare token embeddings. Experimental results from language modeling, word
similarity, and machine translation tasks quantitatively and qualitatively
verify the effectiveness of AGG. |
---|---|
DOI: | 10.48550/arxiv.2109.03127 |