Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic tr...

Full description

Saved in:

Bibliographic Details
Published in	Transactions of the Association for Computational Linguistics Vol. 10; pp. 1423 - 1439
Main Authors	Sartran, Laurent, Barrett, Samuel, Kuncoro, Adhiguna, Stanojević, Miloš, Blunsom, Phil, Dyer, Chris
Format	Journal Article
Language	English
Published	One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA MIT Press 22.12.2022 MIT Press Journals, The The MIT Press
Subjects	Bias Composition Computational linguistics Fashion models Grammars Language Language modeling Linguistics Modelling Neural networks Recursion Sentences Success Syntax Transformers Trees
Online Access	Get full text

Cover

Loading…

Abstract	We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.
AbstractList	We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.
Author	Dyer, Chris Barrett, Samuel Stanojević, Miloš Blunsom, Phil Sartran, Laurent Kuncoro, Adhiguna
Author_xml	– sequence: 1 givenname: Laurent surname: Sartran fullname: Sartran, Laurent email: lsartran@deepmind.com organization: DeepMind, UK. lsartran@deepmind.com – sequence: 2 givenname: Samuel surname: Barrett fullname: Barrett, Samuel email: samuelbarrett1234@btinternet.com organization: University of Oxford, UK. samuelbarrett1234@btinternet.com – sequence: 3 givenname: Adhiguna surname: Kuncoro fullname: Kuncoro, Adhiguna email: akuncoro@deepmind.com organization: University of Oxford, UK – sequence: 4 givenname: Miloš surname: Stanojević fullname: Stanojević, Miloš organization: DeepMind, UK. stanojevic@deepmind.com – sequence: 5 givenname: Phil surname: Blunsom fullname: Blunsom, Phil email: phil.blunsom@cs.ox.ac.uk organization: University of Oxford, UK. phil.blunsom@cs.ox.ac.uk – sequence: 6 givenname: Chris surname: Dyer fullname: Dyer, Chris email: cdyer@deepmind.com organization: DeepMind, UK. cdyer@deepmind.com
BookMark	eNp1kc9rFDEUxwepYK29-QcEvHhwNT8mmYkHoS1aF1Y8tIK38EzejFlmkjXJVOpfb-qKrGJPeSSf93nf8B43RyEGbJqnjL5kTPFXBexkwFAquXrQHHNBu5Xou89HB_Wj5jTnLaWU9aynih83eJ0g5CGmGRO5TDDPkPJrcraMM4biw0gOgQ2EcYERyYfocMrkuy9fydVtqLOLt2Qd3FKLGyTnHjJmAoVcWZjwSfNwgCnj6e_zpPn07u31xfvV5uPl-uJss7KtlGXFei0F7wbhnEa0TmvUMCjW6aGtDwpUD7xD3Wllu1638kslOLLB9kixc-KkWe-9LsLW7JKvv7k1Ebz5dRHTaCDVpBOaVrVAO4bgWldFCrRk1nLtEBVDjdX1bO_apfhtwVzMNi4p1PiG91rotpdSVOrFnrIp5pxw-DOVUXO3F3O4l4rzf3DrCxQfQ0ngp_uanu-bZn8Q4h70zX_QO-Sm8kJQKbXhlAtDmaHC_PC7vwU_ATtMtp4
CitedBy_id	crossref_primary_10_1162_tacl_a_00643 crossref_primary_10_1016_j_ssaho_2025_101332 crossref_primary_10_1162_nol_e_00131 crossref_primary_10_1007_s10489_024_06029_1
Cites_doi	10.18653/v1/N18-1202 10.18653/v1/P18-1007 10.18653/v1/D19-1098 10.18653/v1/N19-1114 10.18653/v1/E17-1117 10.18653/v1/D18-1548 10.3115/v1/P15-2084 10.18653/v1/P19-1285 10.18653/v1/D19-1376 10.18653/v1/2020.acl-main.467 10.1016/j.tics.2015.09.008 10.21236/ADA273556 10.3115/1218955.1218968 10.1006/csla.2000.0147 10.18653/v1/N19-1004 10.18653/v1/2020.emnlp-main.14 10.1109/ICASSP.1995.479396 10.1162/tacl_a_00306 10.18653/v1/P19-1337 10.1073/pnas.1907367117 10.1162/tacl_x_00375 10.1609/aaai.v34i05.6511 10.1515/9783112316009 10.1162/089120101750300526 10.18653/v1/D18-2012 10.1162/tacl_a_00298 10.18653/v1/2021.acl-long.289 10.18653/v1/P18-1198 10.18653/v1/N19-1334 10.18653/v1/2021.eacl-main.228 10.1162/tacl_a_00345 10.18653/v1/2021.findings-acl.380 10.18653/v1/N16-1024 10.18653/v1/P16-1139
ContentType	Journal Article
Copyright	2022. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2022. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	AAYXX CITATION 7T9 8FE 8FG ABUWG AFKRA ALSLI ARAPS AZQEC BENPR BGLVJ CCPQU CPGLG CRLPW DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PRQQA DOA
DOI	10.1162/tacl_a_00526
DatabaseName	CrossRef Linguistics and Language Behavior Abstracts (LLBA) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Social Science Premium Collection Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central ProQuest Technology Collection ProQuest One Community College Linguistics Collection Linguistics Database ProQuest Central Korea ProQuest Central Student ProQuest SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database (Proquest) Advanced Technologies & Aerospace Collection ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest One Social Sciences DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Linguistics Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies & Aerospace Collection Social Science Premium Collection ProQuest One Social Sciences ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Linguistics Database ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	Publicly Available Content Database CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2307-387X
EndPage	1439
ExternalDocumentID	oai_doaj_org_article_464a071ead4d45b6a951cc29dee61e9e 10_1162_tacl_a_00526 tacl_a_00526.pdf
GroupedDBID	AAFWJ AFPKN ALMA_UNASSIGNED_HOLDINGS EBS GROUPED_DOAJ JMNJE M~E OJV OK1 RMI AAYXX ABUWG AFKRA ALSLI ARAPS BENPR BGLVJ CCPQU CITATION CPGLG CRLPW DWQXO HCIFZ K7- PHGZM PHGZT PIMPY 7T9 8FE 8FG AZQEC GNUQQ JQ2 P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PRQQA PUEGO
ID	FETCH-LOGICAL-c455t-1895327f3dd9eecd99e9af6179f45326a68a27e9796c78945b99e2e1fc8e0e7d3
IEDL.DBID	BENPR
ISSN	2307-387X
IngestDate	Wed Aug 27 01:29:44 EDT 2025 Sun Jul 13 04:22:38 EDT 2025 Thu Apr 24 22:59:32 EDT 2025 Tue Jul 01 03:28:36 EDT 2025 Wed Jan 04 12:10:42 EST 2023 Thu Jan 05 05:11:13 EST 2023
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c455t-1895327f3dd9eecd99e9af6179f45326a68a27e9796c78945b99e2e1fc8e0e7d3
Notes	2022 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
OpenAccessLink	https://www.proquest.com/docview/2893948553?pq-origsite=%requestingapplication%
PQID	2893948553
PQPubID	6535866
PageCount	17
ParticipantIDs	mit_journals_10_1162_tacl_a_00526 crossref_citationtrail_10_1162_tacl_a_00526 crossref_primary_10_1162_tacl_a_00526 proquest_journals_2893948553 mit_journals_taclv10_330559_2023_01_03_zip_tacl_a_00526 doaj_primary_oai_doaj_org_article_464a071ead4d45b6a951cc29dee61e9e
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-12-22
PublicationDateYYYYMMDD	2022-12-22
PublicationDate_xml	– month: 12 year: 2022 text: 2022-12-22 day: 22
PublicationDecade	2020
PublicationPlace	One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA
PublicationPlace_xml	– name: One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA – name: Cambridge
PublicationTitle	Transactions of the Association for Computational Linguistics
PublicationYear	2022
Publisher	MIT Press MIT Press Journals, The The MIT Press
Publisher_xml	– name: MIT Press – name: MIT Press Journals, The – name: The MIT Press
References	Ettinger (2022122317462808700_bib14) 2020 Radford (2022122317462808700_bib40) 2019 Charniak (2022122317462808700_bib5) 2000; 36 Kim (2022122317462808700_bib23) 2019 Wilcox (2022122317462808700_bib53) 2019 Bai (2022122317462808700_bib2) 2021 Chomsky (2022122317462808700_bib8) 1957 Peters (2022122317462808700_bib36) 2018 Rae (2022122317462808700_bib41) 2021; abs/2112.11446v2 Brown (2022122317462808700_bib4) 2020 Petrov (2022122317462808700_bib37) 2007 Shen (2022122317462808700_bib45) 2019 Haviv (2022122317462808700_bib18) 2022 Noji (2022122317462808700_bib34) 2021 Kudo (2022122317462808700_bib24) 2018 Zhang (2022122317462808700_bib55) 2020 Jennifer (2022122317462808700_bib21) 2020 Nguyen (2022122317462808700_bib33) 2020 Yogatama (2022122317462808700_bib54) 2018 Kuncoro (2022122317462808700_bib27) 2019 Kuncoro (2022122317462808700_bib28) 2020; 8 Sachan (2022122317462808700_bib43) 2021 Chelba (2022122317462808700_bib6) 2000; 14 Kudo (2022122317462808700_bib25) 2018 Qian (2022122317462808700_bib39) 2021 Conneau (2022122317462808700_bib10) 2018 Manning (2022122317462808700_bib30) 2020; 117 Vinyals (2022122317462808700_bib48) 2015 Kuncoro (2022122317462808700_bib26) 2017 Voita (2022122317462808700_bib49) 2020 Mirowski (2022122317462808700_bib32) 2015 Liu (2022122317462808700_bib29) 2019 Everaert (2022122317462808700_bib15) 2015; 19 Dyer (2022122317462808700_bib13) 2016 Pruksachatkun (2022122317462808700_bib38) 2020 Bowman (2022122317462808700_bib3) 2016 Futrell (2022122317462808700_bib16) 2019 Sekine (2022122317462808700_bib44) 1997 Henderson (2022122317462808700_bib19) 2004 Dai (2022122317462808700_bib11) 2019 Warstadt (2022122317462808700_bib52) 2020 Strubell (2022122317462808700_bib46) 2018 Do (2022122317462808700_bib7) 2016 Chung (2022122317462808700_bib9) 2017 Peng (2022122317462808700_bib35) 2019 Wang (2022122317462808700_bib51) 2019 Hahn (2022122317462808700_bib17) 2020; 8 Roark (2022122317462808700_bib42) 2001; 27 Wang (2022122317462808700_bib50) 2020 Devlin (2022122317462808700_bib12) 2019 Jurafsky (2022122317462808700_bib22) 1995 Hoffmann (2022122317462808700_bib20) 2022 Marcus (2022122317462808700_bib31) 1993; 19 Sundararaman (2022122317462808700_bib47) 2019 Astudillo (2022122317462808700_bib1) 2020
References_xml	– volume-title: Proceedings of NAACL year: 2018 ident: 2022122317462808700_bib36 article-title: Deep contextualized word representations doi: 10.18653/v1/N18-1202 – start-page: 66 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2022122317462808700_bib24 article-title: Subword regularization: Improving neural network translation models with multiple subword candidates doi: 10.18653/v1/P18-1007 – start-page: 1877 volume-title: Advances in Neural Information Processing Systems year: 2020 ident: 2022122317462808700_bib4 article-title: Language models are few-shot learners – volume-title: Proceedings of NAACL year: 2019 ident: 2022122317462808700_bib12 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding – volume-title: Proceedings of EMNLP-IJCNLP year: 2019 ident: 2022122317462808700_bib51 article-title: Tree transformer: Integrating tree structures into self-attention doi: 10.18653/v1/D19-1098 – volume-title: Proceedings of NAACL year: 2019 ident: 2022122317462808700_bib23 article-title: Unsupervised recurrent neural network grammars doi: 10.18653/v1/N19-1114 – start-page: 1249 volume-title: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers year: 2017 ident: 2022122317462808700_bib26 article-title: What do recurrent neural network grammars learn about syntax? doi: 10.18653/v1/E17-1117 – start-page: 5027 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2022122317462808700_bib46 article-title: Linguistically-informed self-attention for semantic role labeling doi: 10.18653/v1/D18-1548 – volume-title: Proceedings of ICLR year: 2018 ident: 2022122317462808700_bib54 article-title: Memory architectures in recurrent neural network language models – start-page: 404 volume-title: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference year: 2007 ident: 2022122317462808700_bib37 article-title: Improved inference for unlexicalized parsing – volume: abs/2112.11446v2 year: 2021 ident: 2022122317462808700_bib41 article-title: Scaling language models: Methods, analysis & insights from training gopher publication-title: CoRR – volume-title: Proceeding of ACL-IJCNLP year: 2015 ident: 2022122317462808700_bib32 article-title: Dependency Recurrent Neural Language Models for Sentence Completion doi: 10.3115/v1/P15-2084 – volume-title: Proceedings of ACL year: 2019 ident: 2022122317462808700_bib11 article-title: Transformer-XL: Attentive language models beyond a fixed-length context doi: 10.18653/v1/P19-1285 – volume-title: Proceedings of EMNLP-IJCNLP year: 2019 ident: 2022122317462808700_bib35 article-title: PaLM: A hybrid parser and language model doi: 10.18653/v1/D19-1376 – year: 2019 ident: 2022122317462808700_bib40 article-title: Language models are unsupervised multitask learners – volume-title: Proceedings of ICLR year: 2020 ident: 2022122317462808700_bib50 article-title: StructBERT: Incorporating language structures into pre-training for deep language understanding – volume-title: Proceedings of ACL year: 2020 ident: 2022122317462808700_bib38 article-title: Intermediate- task transfer learning with pretrained language models: When and why does it work? doi: 10.18653/v1/2020.acl-main.467 – volume: 19 start-page: 729 issue: 12 year: 2015 ident: 2022122317462808700_bib15 article-title: Structures, not strings: Linguistics as part of the cognitive sciences publication-title: Trends in Cognitive Sciences doi: 10.1016/j.tics.2015.09.008 – volume: 19 start-page: 313 issue: 2 year: 1993 ident: 2022122317462808700_bib31 article-title: Building a large annotated corpus of english: The penn treebank publication-title: Computational Linguistics doi: 10.21236/ADA273556 – volume-title: Advances in Neural Information Processing Systems year: 2015 ident: 2022122317462808700_bib48 article-title: Grammar as a foreign language – start-page: 95 volume-title: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) year: 2004 ident: 2022122317462808700_bib19 article-title: Discriminative training of a neural network statistical parser doi: 10.3115/1218955.1218968 – issue: arXiv:2203.16634 year: 2022 ident: 2022122317462808700_bib18 article-title: Transformer language models without positional encodings still learn positional information – volume-title: Findings of EMNLP year: 2020 ident: 2022122317462808700_bib1 article-title: Transition-based parsing with stack-transformers – volume: 14 start-page: 283 issue: 4 year: 2000 ident: 2022122317462808700_bib6 article-title: Structured language modeling publication-title: Computer Speech & Language doi: 10.1006/csla.2000.0147 – start-page: 32 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) year: 2019 ident: 2022122317462808700_bib16 article-title: Neural language models as psycholinguistic subjects: Representations of syntactic state doi: 10.18653/v1/N19-1004 – start-page: 183 volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: 2022122317462808700_bib49 article-title: Information- theoretic probing with minimum description length doi: 10.18653/v1/2020.emnlp-main.14 – volume-title: Proceedings of ICASSP year: 1995 ident: 2022122317462808700_bib22 article-title: Using a stochastic context-free grammar as a language model for speech recognition doi: 10.1109/ICASSP.1995.479396 – volume: 8 start-page: 156 year: 2020 ident: 2022122317462808700_bib17 article-title: Theoretical limitations of self-attention in neural sequence models publication-title: Transactions of the Association for Computational Linguistics doi: 10.1162/tacl_a_00306 – start-page: 2331 volume-title: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing year: 2016 ident: 2022122317462808700_bib7 article-title: Parsing as language modeling – start-page: 3472 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2022122317462808700_bib27 article-title: Scalable syntax-aware language models using knowledge distillation doi: 10.18653/v1/P19-1337 – volume-title: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 year: 2020 ident: 2022122317462808700_bib33 article-title: Tree- structured attention with hierarchical accumulation – volume: 117 issue: 48 year: 2020 ident: 2022122317462808700_bib30 article-title: Emergent linguistic structure in artificial neural networks trained by self-supervision publication-title: Proceedings of the National Academy of Sciences doi: 10.1073/pnas.1907367117 – volume: 36 year: 2000 ident: 2022122317462808700_bib5 article-title: BLLIP 1987–89 WSJ Corpus Release 1, LDC2000T43 publication-title: LDC2000T43. Linguistic Data Consortium – year: 2020 ident: 2022122317462808700_bib52 article-title: BLiMP: The Benchmark of Linguistic Minimal Pairs for English publication-title: TACL doi: 10.1162/tacl_x_00375 – start-page: 9636 volume-title: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020 year: 2020 ident: 2022122317462808700_bib55 article-title: SG-Net: Syntax-guided machine reading comprehension doi: 10.1609/aaai.v34i05.6511 – volume-title: Syntactic Structures year: 1957 ident: 2022122317462808700_bib8 doi: 10.1515/9783112316009 – volume: 27 start-page: 249 issue: 2 year: 2001 ident: 2022122317462808700_bib42 article-title: Probabilistic Top-Down Parsing and Language Modeling publication-title: Computational Linguistics doi: 10.1162/089120101750300526 – year: 2019 ident: 2022122317462808700_bib47 article-title: Syntax-infused transformer and BERT models for machine translation and natural language understanding publication-title: arXiv preprint arXiv:1911 .06156v1 – start-page: 66 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations year: 2018 ident: 2022122317462808700_bib25 article-title: SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing doi: 10.18653/v1/D18-2012 – year: 1997 ident: 2022122317462808700_bib44 article-title: Evalb - bracket scoring program – year: 2020 ident: 2022122317462808700_bib14 article-title: What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models publication-title: Transactions of the Association for Computational Linguistics doi: 10.1162/tacl_a_00298 – start-page: 1725 volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics year: 2020 ident: 2022122317462808700_bib21 article-title: A systematic assessment of syntactic generalization in neural language models – start-page: 3011 volume-title: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume year: 2021 ident: 2022122317462808700_bib2 article-title: Syntax-BERT: Improving pre-trained transformers with syntax trees – start-page: 3735 volume-title: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) year: 2021 ident: 2022122317462808700_bib39 article-title: Structural guidance for transformer language models doi: 10.18653/v1/2021.acl-long.289 – volume-title: International Conference on Learning Representations year: 2019 ident: 2022122317462808700_bib45 article-title: Ordered neurons: Integrating tree structures into recurrent neural networks – volume-title: Proceedings of ICLR year: 2017 ident: 2022122317462808700_bib9 article-title: Hierarchical multiscale recurrent neural networks – start-page: 2126 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2022122317462808700_bib10 article-title: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties doi: 10.18653/v1/P18-1198 – start-page: 3302 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) year: 2019 ident: 2022122317462808700_bib53 article-title: structural supervision improves learning of non-local grammatical dependencies doi: 10.18653/v1/N19-1334 – volume-title: Proceedings of EACL year: 2021 ident: 2022122317462808700_bib43 article-title: Do syntax trees help pre-trained transformers extract information? doi: 10.18653/v1/2021.eacl-main.228 – volume: 8 start-page: 776 year: 2020 ident: 2022122317462808700_bib28 article-title: Syntactic structure distillation pretraining for bidirectional encoders publication-title: Transactions of the Association for Computational Linguistics doi: 10.1162/tacl_a_00345 – volume-title: Findings of the ACL-IJCNLP year: 2021 ident: 2022122317462808700_bib34 article-title: Effective batching for recurrent neural network grammars doi: 10.18653/v1/2021.findings-acl.380 – start-page: 199 volume-title: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies year: 2016 ident: 2022122317462808700_bib13 article-title: Recurrent neural network grammars doi: 10.18653/v1/N16-1024 – year: 2022 ident: 2022122317462808700_bib20 article-title: Training compute-optimal large language models publication-title: CoRR – start-page: 1073 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) year: 2019 ident: 2022122317462808700_bib29 article-title: Linguistic knowledge and transferability of contextual representations – start-page: 1466 volume-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2016 ident: 2022122317462808700_bib3 article-title: A fast unified model for parsing and sentence understanding doi: 10.18653/v1/P16-1139
SSID	ssj0001818062
Score	2.409518
Snippet	We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong...
SourceID	doaj proquest crossref mit
SourceType	Open Website Aggregation Database Enrichment Source Index Database Publisher
StartPage	1423
SubjectTerms	Bias Composition Computational linguistics Fashion models Grammars Language Language modeling Linguistics Modelling Neural networks Recursion Sentences Success Syntax Transformers Trees
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iyYsoKtYXEfQki5vHZjfeqvhA1EsreAtpMiuFupZ2K-ivd5JutSrixetmwu7OJJkHme8j5EDYwmU-JDmhxiSFdYkuUp8IATxVnDMXOSNv79TVvbx-yB7mqL7CnbApPPBUccdSSYtuEH9Yepn1lMWQwDmuPYBioCGcvujz5pKpWF0JLcyKz266K35cWzcw1kR8ky8-KEL1o2d56tc_zuPoZC5WyHITHdL29KtWyQJUawS6s-ASRvRyZEO32fiEtieP8apP9UjnBW6aAiQNLGeDMQ2FVtp5rerYDUUDVUc84uhpHx3YmNqadtBOsE7uL867Z1dJw46QOJlldcIKnQmel8J7DeC81qBtiQGJLiUOKKsKy3PQuVYuLzTqDiU4sNIVkELuxQZZrJ4r2CQ0A-ZLnTMMTYS0zPVs5lPA2AS3a9mTRYsczfRlXAMdHhgsBiamEIqbee22yOGH9HAKmfGL3GlQ_YdMALqOD9D8pjG_-cv8LbKPhjPNxhv_8qL8i0wYe0FBEdDOtAkE8iZlJhXmrT_8NnNntiA-p2OKKiKmjtj6jz_YJks8tFQwnnC-Qxbr0QR2MdCpe3txTb8DoUD8AQ priority: 102 providerName: Directory of Open Access Journals
Title	Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
URI	https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00526 https://www.proquest.com/docview/2893948553 https://doaj.org/article/464a071ead4d45b6a951cc29dee61e9e
Volume	10
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3Pb9MwFLbYduGCQIAojMpIcELREttxYi5oResmBBNim9Sb5dovVaWuLU2GBH8973nOuoHGNX5WFNvvZ_y-j7G30tW-DJTkUI1JSeczU-chkxJEroUofOSM_HqqTy7U50k5SQW3Nl2r7G1iNNRh5alGfoCJgYxIJvLj-kdGrFH0dzVRaOywPTTBNSZfe6Oj02_ft1UWamWOrKJ04ZmAZCf97XctDjrnF9bZiHlyxy9F-H70Npfz7h8bHR3P-DF7lCJGfni9xU_YA1g-ZXDeB5yw4ccbRx1o7Qd-eDWL13-WM35b4EsqSnJiPlu0nIqv_OzXsosdUpzoO6LZ46M5OrWWu46f4d7BM3YxPjr_dJIlxoTMq7LssqI2pRRVI0MwAD4YA8Y1GKSYRuGAdrp2ogJTGe2r2qhyihICisbXkEMV5HO2u1wt4QXjJRShMVWB4YpUrvBTV4YcMF5BFW6mqh6w9_16WZ_gxInVYmFjWqGFvb26A_buRnp9DaNxj9yIlv5GhsCv44PVZmaTLlmllcPICHVABfwE7TBK9F6YAKALMDBgb3DjbFLG9p4XVXdkaOwnCkpCQDOWSOVtXthc2t_z9V8z9_sDsZ2-PZ0v_z_8ij0U1EBRiEyIfbbbba7gNYY13XTIdurx8TCd4GEsDvwBecf4BQ
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwELZKe4ALDwFioYCR6AmlTWznYSQObaFs6bZCdCv25jr2ZLViyVabLKj9K_wVfhxjb9Jti8qtEtd4nCjJeF6a-T5CXnOdmdi6JMfVmATXJpBZaAPOgYUJY5HxnJH7B0n3SHwaxIMl8qudhXFtla1N9IbaToyrkW9gYsA9kknLVL0Hpz8xP6ve7b7Hn7nG2M6H_nY3aCgEAiPiuA6iTMacpQW3VgIYKyVIXaDXloXAhUQnmWYpyFQmJs2kiHOUYBAVJoMQUsvxvrfICmYVMZ76le0vvc9fFyUcNyftKUtdN7VDqR20rfUJ26i1GSutPKDKJafnuQHQlX0f1X85AO_Vdu6R3-33mDezfFuf1fm6ObsCFfmffrD75G4TTdPNufo_IEtQPiTQb4NxmNKPU-2m86q3dHM29K1R5ZBeFOg1BVvqWOHGFXWFaXp4WtZ-eow6ahPvEujWCB1-RXVND1Gv4RE5upE3e0yWy0kJTwiNIbKFTCMM5bjQkcl1bEPAWA7NW5GLrEPetL9bmQZq3TF-jJVPuRKmLipHh6ydS5_MIUaukdtymnMu44DB_YXJdKgaO6NEIjRGjWgfhMVXSDRG0MYwaQGSCCR0yCvUO9UYquqaB6WXZNzaDxTkDh1OKoZhngojFXJ1Njq5snO11cXF9oUiPv338ktyu9vf76ne7sHeM3KHuUGTiAWMrZLlejqD5xj-1fmL5hhScnzTivwHzX1nKQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transformer+Grammars%3A+Augmenting+Transformer+Language+Models+with+Syntactic+Inductive+Biases+at+Scale&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Sartran%2C+Laurent&rft.au=Barrett%2C+Samuel&rft.au=Kuncoro%2C+Adhiguna&rft.au=Stanojevi%C4%87%2C+Milo%C5%A1&rft.date=2022-12-22&rft.pub=MIT+Press&rft.eissn=2307-387X&rft.volume=10&rft.spage=1423&rft.epage=1439&rft_id=info:doi/10.1162%2Ftacl_a_00526&rft.externalDBID=n%2Fa&rft.externalDocID=tacl_a_00526.pdf
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon