Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic tr...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 10; pp. 1423 - 1439
Main Authors Sartran, Laurent, Barrett, Samuel, Kuncoro, Adhiguna, Stanojević, Miloš, Blunsom, Phil, Dyer, Chris
Format Journal Article
LanguageEnglish
Published One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA MIT Press 22.12.2022
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.
AbstractList We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.
Author Dyer, Chris
Barrett, Samuel
Stanojević, Miloš
Blunsom, Phil
Sartran, Laurent
Kuncoro, Adhiguna
Author_xml – sequence: 1
  givenname: Laurent
  surname: Sartran
  fullname: Sartran, Laurent
  email: lsartran@deepmind.com
  organization: DeepMind, UK. lsartran@deepmind.com
– sequence: 2
  givenname: Samuel
  surname: Barrett
  fullname: Barrett, Samuel
  email: samuelbarrett1234@btinternet.com
  organization: University of Oxford, UK. samuelbarrett1234@btinternet.com
– sequence: 3
  givenname: Adhiguna
  surname: Kuncoro
  fullname: Kuncoro, Adhiguna
  email: akuncoro@deepmind.com
  organization: University of Oxford, UK
– sequence: 4
  givenname: Miloš
  surname: Stanojević
  fullname: Stanojević, Miloš
  organization: DeepMind, UK. stanojevic@deepmind.com
– sequence: 5
  givenname: Phil
  surname: Blunsom
  fullname: Blunsom, Phil
  email: phil.blunsom@cs.ox.ac.uk
  organization: University of Oxford, UK. phil.blunsom@cs.ox.ac.uk
– sequence: 6
  givenname: Chris
  surname: Dyer
  fullname: Dyer, Chris
  email: cdyer@deepmind.com
  organization: DeepMind, UK. cdyer@deepmind.com
BookMark eNp1kc9rFDEUxwepYK29-QcEvHhwNT8mmYkHoS1aF1Y8tIK38EzejFlmkjXJVOpfb-qKrGJPeSSf93nf8B43RyEGbJqnjL5kTPFXBexkwFAquXrQHHNBu5Xou89HB_Wj5jTnLaWU9aynih83eJ0g5CGmGRO5TDDPkPJrcraMM4biw0gOgQ2EcYERyYfocMrkuy9fydVtqLOLt2Qd3FKLGyTnHjJmAoVcWZjwSfNwgCnj6e_zpPn07u31xfvV5uPl-uJss7KtlGXFei0F7wbhnEa0TmvUMCjW6aGtDwpUD7xD3Wllu1638kslOLLB9kixc-KkWe-9LsLW7JKvv7k1Ebz5dRHTaCDVpBOaVrVAO4bgWldFCrRk1nLtEBVDjdX1bO_apfhtwVzMNi4p1PiG91rotpdSVOrFnrIp5pxw-DOVUXO3F3O4l4rzf3DrCxQfQ0ngp_uanu-bZn8Q4h70zX_QO-Sm8kJQKbXhlAtDmaHC_PC7vwU_ATtMtp4
CitedBy_id crossref_primary_10_1162_tacl_a_00643
crossref_primary_10_1016_j_ssaho_2025_101332
crossref_primary_10_1162_nol_e_00131
crossref_primary_10_1007_s10489_024_06029_1
Cites_doi 10.18653/v1/N18-1202
10.18653/v1/P18-1007
10.18653/v1/D19-1098
10.18653/v1/N19-1114
10.18653/v1/E17-1117
10.18653/v1/D18-1548
10.3115/v1/P15-2084
10.18653/v1/P19-1285
10.18653/v1/D19-1376
10.18653/v1/2020.acl-main.467
10.1016/j.tics.2015.09.008
10.21236/ADA273556
10.3115/1218955.1218968
10.1006/csla.2000.0147
10.18653/v1/N19-1004
10.18653/v1/2020.emnlp-main.14
10.1109/ICASSP.1995.479396
10.1162/tacl_a_00306
10.18653/v1/P19-1337
10.1073/pnas.1907367117
10.1162/tacl_x_00375
10.1609/aaai.v34i05.6511
10.1515/9783112316009
10.1162/089120101750300526
10.18653/v1/D18-2012
10.1162/tacl_a_00298
10.18653/v1/2021.acl-long.289
10.18653/v1/P18-1198
10.18653/v1/N19-1334
10.18653/v1/2021.eacl-main.228
10.1162/tacl_a_00345
10.18653/v1/2021.findings-acl.380
10.18653/v1/N16-1024
10.18653/v1/P16-1139
ContentType Journal Article
Copyright 2022. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7T9
8FE
8FG
ABUWG
AFKRA
ALSLI
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CPGLG
CRLPW
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
DOA
DOI 10.1162/tacl_a_00526
DatabaseName CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Social Science Premium Collection
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One Community College
Linguistics Collection
Linguistics Database
ProQuest Central Korea
ProQuest Central Student
ProQuest SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database (Proquest)
Advanced Technologies & Aerospace Collection
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest One Social Sciences
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Linguistics Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
Social Science Premium Collection
ProQuest One Social Sciences
ProQuest One Academic Eastern Edition
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Linguistics Database
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Publicly Available Content Database

CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2307-387X
EndPage 1439
ExternalDocumentID oai_doaj_org_article_464a071ead4d45b6a951cc29dee61e9e
10_1162_tacl_a_00526
tacl_a_00526.pdf
GroupedDBID AAFWJ
AFPKN
ALMA_UNASSIGNED_HOLDINGS
EBS
GROUPED_DOAJ
JMNJE
M~E
OJV
OK1
RMI
AAYXX
ABUWG
AFKRA
ALSLI
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
CPGLG
CRLPW
DWQXO
HCIFZ
K7-
PHGZM
PHGZT
PIMPY
7T9
8FE
8FG
AZQEC
GNUQQ
JQ2
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
PUEGO
ID FETCH-LOGICAL-c455t-1895327f3dd9eecd99e9af6179f45326a68a27e9796c78945b99e2e1fc8e0e7d3
IEDL.DBID BENPR
ISSN 2307-387X
IngestDate Wed Aug 27 01:29:44 EDT 2025
Sun Jul 13 04:22:38 EDT 2025
Thu Apr 24 22:59:32 EDT 2025
Tue Jul 01 03:28:36 EDT 2025
Wed Jan 04 12:10:42 EST 2023
Thu Jan 05 05:11:13 EST 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c455t-1895327f3dd9eecd99e9af6179f45326a68a27e9796c78945b99e2e1fc8e0e7d3
Notes 2022
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2893948553?pq-origsite=%requestingapplication%
PQID 2893948553
PQPubID 6535866
PageCount 17
ParticipantIDs mit_journals_10_1162_tacl_a_00526
crossref_citationtrail_10_1162_tacl_a_00526
crossref_primary_10_1162_tacl_a_00526
proquest_journals_2893948553
mit_journals_taclv10_330559_2023_01_03_zip_tacl_a_00526
doaj_primary_oai_doaj_org_article_464a071ead4d45b6a951cc29dee61e9e
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-12-22
PublicationDateYYYYMMDD 2022-12-22
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-22
  day: 22
PublicationDecade 2020
PublicationPlace One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA
PublicationPlace_xml – name: One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA
– name: Cambridge
PublicationTitle Transactions of the Association for Computational Linguistics
PublicationYear 2022
Publisher MIT Press
MIT Press Journals, The
The MIT Press
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
– name: The MIT Press
References Ettinger (2022122317462808700_bib14) 2020
Radford (2022122317462808700_bib40) 2019
Charniak (2022122317462808700_bib5) 2000; 36
Kim (2022122317462808700_bib23) 2019
Wilcox (2022122317462808700_bib53) 2019
Bai (2022122317462808700_bib2) 2021
Chomsky (2022122317462808700_bib8) 1957
Peters (2022122317462808700_bib36) 2018
Rae (2022122317462808700_bib41) 2021; abs/2112.11446v2
Brown (2022122317462808700_bib4) 2020
Petrov (2022122317462808700_bib37) 2007
Shen (2022122317462808700_bib45) 2019
Haviv (2022122317462808700_bib18) 2022
Noji (2022122317462808700_bib34) 2021
Kudo (2022122317462808700_bib24) 2018
Zhang (2022122317462808700_bib55) 2020
Jennifer (2022122317462808700_bib21) 2020
Nguyen (2022122317462808700_bib33) 2020
Yogatama (2022122317462808700_bib54) 2018
Kuncoro (2022122317462808700_bib27) 2019
Kuncoro (2022122317462808700_bib28) 2020; 8
Sachan (2022122317462808700_bib43) 2021
Chelba (2022122317462808700_bib6) 2000; 14
Kudo (2022122317462808700_bib25) 2018
Qian (2022122317462808700_bib39) 2021
Conneau (2022122317462808700_bib10) 2018
Manning (2022122317462808700_bib30) 2020; 117
Vinyals (2022122317462808700_bib48) 2015
Kuncoro (2022122317462808700_bib26) 2017
Voita (2022122317462808700_bib49) 2020
Mirowski (2022122317462808700_bib32) 2015
Liu (2022122317462808700_bib29) 2019
Everaert (2022122317462808700_bib15) 2015; 19
Dyer (2022122317462808700_bib13) 2016
Pruksachatkun (2022122317462808700_bib38) 2020
Bowman (2022122317462808700_bib3) 2016
Futrell (2022122317462808700_bib16) 2019
Sekine (2022122317462808700_bib44) 1997
Henderson (2022122317462808700_bib19) 2004
Dai (2022122317462808700_bib11) 2019
Warstadt (2022122317462808700_bib52) 2020
Strubell (2022122317462808700_bib46) 2018
Do (2022122317462808700_bib7) 2016
Chung (2022122317462808700_bib9) 2017
Peng (2022122317462808700_bib35) 2019
Wang (2022122317462808700_bib51) 2019
Hahn (2022122317462808700_bib17) 2020; 8
Roark (2022122317462808700_bib42) 2001; 27
Wang (2022122317462808700_bib50) 2020
Devlin (2022122317462808700_bib12) 2019
Jurafsky (2022122317462808700_bib22) 1995
Hoffmann (2022122317462808700_bib20) 2022
Marcus (2022122317462808700_bib31) 1993; 19
Sundararaman (2022122317462808700_bib47) 2019
Astudillo (2022122317462808700_bib1) 2020
References_xml – volume-title: Proceedings of NAACL
  year: 2018
  ident: 2022122317462808700_bib36
  article-title: Deep contextualized word representations
  doi: 10.18653/v1/N18-1202
– start-page: 66
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2022122317462808700_bib24
  article-title: Subword regularization: Improving neural network translation models with multiple subword candidates
  doi: 10.18653/v1/P18-1007
– start-page: 1877
  volume-title: Advances in Neural Information Processing Systems
  year: 2020
  ident: 2022122317462808700_bib4
  article-title: Language models are few-shot learners
– volume-title: Proceedings of NAACL
  year: 2019
  ident: 2022122317462808700_bib12
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
– volume-title: Proceedings of EMNLP-IJCNLP
  year: 2019
  ident: 2022122317462808700_bib51
  article-title: Tree transformer: Integrating tree structures into self-attention
  doi: 10.18653/v1/D19-1098
– volume-title: Proceedings of NAACL
  year: 2019
  ident: 2022122317462808700_bib23
  article-title: Unsupervised recurrent neural network grammars
  doi: 10.18653/v1/N19-1114
– start-page: 1249
  volume-title: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
  year: 2017
  ident: 2022122317462808700_bib26
  article-title: What do recurrent neural network grammars learn about syntax?
  doi: 10.18653/v1/E17-1117
– start-page: 5027
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2022122317462808700_bib46
  article-title: Linguistically-informed self-attention for semantic role labeling
  doi: 10.18653/v1/D18-1548
– volume-title: Proceedings of ICLR
  year: 2018
  ident: 2022122317462808700_bib54
  article-title: Memory architectures in recurrent neural network language models
– start-page: 404
  volume-title: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
  year: 2007
  ident: 2022122317462808700_bib37
  article-title: Improved inference for unlexicalized parsing
– volume: abs/2112.11446v2
  year: 2021
  ident: 2022122317462808700_bib41
  article-title: Scaling language models: Methods, analysis & insights from training gopher
  publication-title: CoRR
– volume-title: Proceeding of ACL-IJCNLP
  year: 2015
  ident: 2022122317462808700_bib32
  article-title: Dependency Recurrent Neural Language Models for Sentence Completion
  doi: 10.3115/v1/P15-2084
– volume-title: Proceedings of ACL
  year: 2019
  ident: 2022122317462808700_bib11
  article-title: Transformer-XL: Attentive language models beyond a fixed-length context
  doi: 10.18653/v1/P19-1285
– volume-title: Proceedings of EMNLP-IJCNLP
  year: 2019
  ident: 2022122317462808700_bib35
  article-title: PaLM: A hybrid parser and language model
  doi: 10.18653/v1/D19-1376
– year: 2019
  ident: 2022122317462808700_bib40
  article-title: Language models are unsupervised multitask learners
– volume-title: Proceedings of ICLR
  year: 2020
  ident: 2022122317462808700_bib50
  article-title: StructBERT: Incorporating language structures into pre-training for deep language understanding
– volume-title: Proceedings of ACL
  year: 2020
  ident: 2022122317462808700_bib38
  article-title: Intermediate- task transfer learning with pretrained language models: When and why does it work?
  doi: 10.18653/v1/2020.acl-main.467
– volume: 19
  start-page: 729
  issue: 12
  year: 2015
  ident: 2022122317462808700_bib15
  article-title: Structures, not strings: Linguistics as part of the cognitive sciences
  publication-title: Trends in Cognitive Sciences
  doi: 10.1016/j.tics.2015.09.008
– volume: 19
  start-page: 313
  issue: 2
  year: 1993
  ident: 2022122317462808700_bib31
  article-title: Building a large annotated corpus of english: The penn treebank
  publication-title: Computational Linguistics
  doi: 10.21236/ADA273556
– volume-title: Advances in Neural Information Processing Systems
  year: 2015
  ident: 2022122317462808700_bib48
  article-title: Grammar as a foreign language
– start-page: 95
  volume-title: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)
  year: 2004
  ident: 2022122317462808700_bib19
  article-title: Discriminative training of a neural network statistical parser
  doi: 10.3115/1218955.1218968
– issue: arXiv:2203.16634
  year: 2022
  ident: 2022122317462808700_bib18
  article-title: Transformer language models without positional encodings still learn positional information
– volume-title: Findings of EMNLP
  year: 2020
  ident: 2022122317462808700_bib1
  article-title: Transition-based parsing with stack-transformers
– volume: 14
  start-page: 283
  issue: 4
  year: 2000
  ident: 2022122317462808700_bib6
  article-title: Structured language modeling
  publication-title: Computer Speech & Language
  doi: 10.1006/csla.2000.0147
– start-page: 32
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
  year: 2019
  ident: 2022122317462808700_bib16
  article-title: Neural language models as psycholinguistic subjects: Representations of syntactic state
  doi: 10.18653/v1/N19-1004
– start-page: 183
  volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  year: 2020
  ident: 2022122317462808700_bib49
  article-title: Information- theoretic probing with minimum description length
  doi: 10.18653/v1/2020.emnlp-main.14
– volume-title: Proceedings of ICASSP
  year: 1995
  ident: 2022122317462808700_bib22
  article-title: Using a stochastic context-free grammar as a language model for speech recognition
  doi: 10.1109/ICASSP.1995.479396
– volume: 8
  start-page: 156
  year: 2020
  ident: 2022122317462808700_bib17
  article-title: Theoretical limitations of self-attention in neural sequence models
  publication-title: Transactions of the Association for Computational Linguistics
  doi: 10.1162/tacl_a_00306
– start-page: 2331
  volume-title: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
  year: 2016
  ident: 2022122317462808700_bib7
  article-title: Parsing as language modeling
– start-page: 3472
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2022122317462808700_bib27
  article-title: Scalable syntax-aware language models using knowledge distillation
  doi: 10.18653/v1/P19-1337
– volume-title: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020
  year: 2020
  ident: 2022122317462808700_bib33
  article-title: Tree- structured attention with hierarchical accumulation
– volume: 117
  issue: 48
  year: 2020
  ident: 2022122317462808700_bib30
  article-title: Emergent linguistic structure in artificial neural networks trained by self-supervision
  publication-title: Proceedings of the National Academy of Sciences
  doi: 10.1073/pnas.1907367117
– volume: 36
  year: 2000
  ident: 2022122317462808700_bib5
  article-title: BLLIP 1987–89 WSJ Corpus Release 1, LDC2000T43
  publication-title: LDC2000T43. Linguistic Data Consortium
– year: 2020
  ident: 2022122317462808700_bib52
  article-title: BLiMP: The Benchmark of Linguistic Minimal Pairs for English
  publication-title: TACL
  doi: 10.1162/tacl_x_00375
– start-page: 9636
  volume-title: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020
  year: 2020
  ident: 2022122317462808700_bib55
  article-title: SG-Net: Syntax-guided machine reading comprehension
  doi: 10.1609/aaai.v34i05.6511
– volume-title: Syntactic Structures
  year: 1957
  ident: 2022122317462808700_bib8
  doi: 10.1515/9783112316009
– volume: 27
  start-page: 249
  issue: 2
  year: 2001
  ident: 2022122317462808700_bib42
  article-title: Probabilistic Top-Down Parsing and Language Modeling
  publication-title: Computational Linguistics
  doi: 10.1162/089120101750300526
– year: 2019
  ident: 2022122317462808700_bib47
  article-title: Syntax-infused transformer and BERT models for machine translation and natural language understanding
  publication-title: arXiv preprint arXiv:1911 .06156v1
– start-page: 66
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
  year: 2018
  ident: 2022122317462808700_bib25
  article-title: SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
  doi: 10.18653/v1/D18-2012
– year: 1997
  ident: 2022122317462808700_bib44
  article-title: Evalb - bracket scoring program
– year: 2020
  ident: 2022122317462808700_bib14
  article-title: What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models
  publication-title: Transactions of the Association for Computational Linguistics
  doi: 10.1162/tacl_a_00298
– start-page: 1725
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
  year: 2020
  ident: 2022122317462808700_bib21
  article-title: A systematic assessment of syntactic generalization in neural language models
– start-page: 3011
  volume-title: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
  year: 2021
  ident: 2022122317462808700_bib2
  article-title: Syntax-BERT: Improving pre-trained transformers with syntax trees
– start-page: 3735
  volume-title: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
  year: 2021
  ident: 2022122317462808700_bib39
  article-title: Structural guidance for transformer language models
  doi: 10.18653/v1/2021.acl-long.289
– volume-title: International Conference on Learning Representations
  year: 2019
  ident: 2022122317462808700_bib45
  article-title: Ordered neurons: Integrating tree structures into recurrent neural networks
– volume-title: Proceedings of ICLR
  year: 2017
  ident: 2022122317462808700_bib9
  article-title: Hierarchical multiscale recurrent neural networks
– start-page: 2126
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2022122317462808700_bib10
  article-title: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
  doi: 10.18653/v1/P18-1198
– start-page: 3302
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
  year: 2019
  ident: 2022122317462808700_bib53
  article-title: structural supervision improves learning of non-local grammatical dependencies
  doi: 10.18653/v1/N19-1334
– volume-title: Proceedings of EACL
  year: 2021
  ident: 2022122317462808700_bib43
  article-title: Do syntax trees help pre-trained transformers extract information?
  doi: 10.18653/v1/2021.eacl-main.228
– volume: 8
  start-page: 776
  year: 2020
  ident: 2022122317462808700_bib28
  article-title: Syntactic structure distillation pretraining for bidirectional encoders
  publication-title: Transactions of the Association for Computational Linguistics
  doi: 10.1162/tacl_a_00345
– volume-title: Findings of the ACL-IJCNLP
  year: 2021
  ident: 2022122317462808700_bib34
  article-title: Effective batching for recurrent neural network grammars
  doi: 10.18653/v1/2021.findings-acl.380
– start-page: 199
  volume-title: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  year: 2016
  ident: 2022122317462808700_bib13
  article-title: Recurrent neural network grammars
  doi: 10.18653/v1/N16-1024
– year: 2022
  ident: 2022122317462808700_bib20
  article-title: Training compute-optimal large language models
  publication-title: CoRR
– start-page: 1073
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
  year: 2019
  ident: 2022122317462808700_bib29
  article-title: Linguistic knowledge and transferability of contextual representations
– start-page: 1466
  volume-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2016
  ident: 2022122317462808700_bib3
  article-title: A fast unified model for parsing and sentence understanding
  doi: 10.18653/v1/P16-1139
SSID ssj0001818062
Score 2.409518
Snippet We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong...
SourceID doaj
proquest
crossref
mit
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1423
SubjectTerms Bias
Composition
Computational linguistics
Fashion models
Grammars
Language
Language modeling
Linguistics
Modelling
Neural networks
Recursion
Sentences
Success
Syntax
Transformers
Trees
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iyYsoKtYXEfQki5vHZjfeqvhA1EsreAtpMiuFupZ2K-ivd5JutSrixetmwu7OJJkHme8j5EDYwmU-JDmhxiSFdYkuUp8IATxVnDMXOSNv79TVvbx-yB7mqL7CnbApPPBUccdSSYtuEH9Yepn1lMWQwDmuPYBioCGcvujz5pKpWF0JLcyKz266K35cWzcw1kR8ky8-KEL1o2d56tc_zuPoZC5WyHITHdL29KtWyQJUawS6s-ASRvRyZEO32fiEtieP8apP9UjnBW6aAiQNLGeDMQ2FVtp5rerYDUUDVUc84uhpHx3YmNqadtBOsE7uL867Z1dJw46QOJlldcIKnQmel8J7DeC81qBtiQGJLiUOKKsKy3PQuVYuLzTqDiU4sNIVkELuxQZZrJ4r2CQ0A-ZLnTMMTYS0zPVs5lPA2AS3a9mTRYsczfRlXAMdHhgsBiamEIqbee22yOGH9HAKmfGL3GlQ_YdMALqOD9D8pjG_-cv8LbKPhjPNxhv_8qL8i0wYe0FBEdDOtAkE8iZlJhXmrT_8NnNntiA-p2OKKiKmjtj6jz_YJks8tFQwnnC-Qxbr0QR2MdCpe3txTb8DoUD8AQ
  priority: 102
  providerName: Directory of Open Access Journals
Title Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
URI https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00526
https://www.proquest.com/docview/2893948553
https://doaj.org/article/464a071ead4d45b6a951cc29dee61e9e
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3Pb9MwFLbYduGCQIAojMpIcELREttxYi5oResmBBNim9Sb5dovVaWuLU2GBH8973nOuoHGNX5WFNvvZ_y-j7G30tW-DJTkUI1JSeczU-chkxJEroUofOSM_HqqTy7U50k5SQW3Nl2r7G1iNNRh5alGfoCJgYxIJvLj-kdGrFH0dzVRaOywPTTBNSZfe6Oj02_ft1UWamWOrKJ04ZmAZCf97XctDjrnF9bZiHlyxy9F-H70Npfz7h8bHR3P-DF7lCJGfni9xU_YA1g-ZXDeB5yw4ccbRx1o7Qd-eDWL13-WM35b4EsqSnJiPlu0nIqv_OzXsosdUpzoO6LZ46M5OrWWu46f4d7BM3YxPjr_dJIlxoTMq7LssqI2pRRVI0MwAD4YA8Y1GKSYRuGAdrp2ogJTGe2r2qhyihICisbXkEMV5HO2u1wt4QXjJRShMVWB4YpUrvBTV4YcMF5BFW6mqh6w9_16WZ_gxInVYmFjWqGFvb26A_buRnp9DaNxj9yIlv5GhsCv44PVZmaTLlmllcPICHVABfwE7TBK9F6YAKALMDBgb3DjbFLG9p4XVXdkaOwnCkpCQDOWSOVtXthc2t_z9V8z9_sDsZ2-PZ0v_z_8ij0U1EBRiEyIfbbbba7gNYY13XTIdurx8TCd4GEsDvwBecf4BQ
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwELZKe4ALDwFioYCR6AmlTWznYSQObaFs6bZCdCv25jr2ZLViyVabLKj9K_wVfhxjb9Jti8qtEtd4nCjJeF6a-T5CXnOdmdi6JMfVmATXJpBZaAPOgYUJY5HxnJH7B0n3SHwaxIMl8qudhXFtla1N9IbaToyrkW9gYsA9kknLVL0Hpz8xP6ve7b7Hn7nG2M6H_nY3aCgEAiPiuA6iTMacpQW3VgIYKyVIXaDXloXAhUQnmWYpyFQmJs2kiHOUYBAVJoMQUsvxvrfICmYVMZ76le0vvc9fFyUcNyftKUtdN7VDqR20rfUJ26i1GSutPKDKJafnuQHQlX0f1X85AO_Vdu6R3-33mDezfFuf1fm6ObsCFfmffrD75G4TTdPNufo_IEtQPiTQb4NxmNKPU-2m86q3dHM29K1R5ZBeFOg1BVvqWOHGFXWFaXp4WtZ-eow6ahPvEujWCB1-RXVND1Gv4RE5upE3e0yWy0kJTwiNIbKFTCMM5bjQkcl1bEPAWA7NW5GLrEPetL9bmQZq3TF-jJVPuRKmLipHh6ydS5_MIUaukdtymnMu44DB_YXJdKgaO6NEIjRGjWgfhMVXSDRG0MYwaQGSCCR0yCvUO9UYquqaB6WXZNzaDxTkDh1OKoZhngojFXJ1Njq5snO11cXF9oUiPv338ktyu9vf76ne7sHeM3KHuUGTiAWMrZLlejqD5xj-1fmL5hhScnzTivwHzX1nKQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transformer+Grammars%3A+Augmenting+Transformer+Language+Models+with+Syntactic+Inductive+Biases+at+Scale&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Sartran%2C+Laurent&rft.au=Barrett%2C+Samuel&rft.au=Kuncoro%2C+Adhiguna&rft.au=Stanojevi%C4%87%2C+Milo%C5%A1&rft.date=2022-12-22&rft.pub=MIT+Press&rft.eissn=2307-387X&rft.volume=10&rft.spage=1423&rft.epage=1439&rft_id=info:doi/10.1162%2Ftacl_a_00526&rft.externalDBID=n%2Fa&rft.externalDocID=tacl_a_00526.pdf
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon