Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic tr...

Full description

Saved in:

Bibliographic Details
Published in	Transactions of the Association for Computational Linguistics Vol. 10; pp. 1423 - 1439
Main Authors	Sartran, Laurent, Barrett, Samuel, Kuncoro, Adhiguna, Stanojević, Miloš, Blunsom, Phil, Dyer, Chris
Format	Journal Article
Language	English
Published	One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA MIT Press 22.12.2022 MIT Press Journals, The The MIT Press
Subjects	Bias Composition Computational linguistics Fashion models Grammars Language Language modeling Linguistics Modelling Neural networks Recursion Sentences Success Syntax Transformers Trees
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.
Bibliography:	2022 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2307-387X 2307-387X
DOI:	10.1162/tacl_a_00526