Word Acquisition in Neural Language Models

We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., ). Drawing on studies of word acquisition in children, we evaluate mult...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 10; pp. 1 - 16
Main Authors Chang, Tyler A., Bergen, Benjamin K.
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 28.01.2022
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., ). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words’ ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition. Language models rely far more on word frequency than children, but, like children, they exhibit slower learning of words in longer utterances. Interestingly, models follow consistent patterns during training for both unidirectional and bidirectional models, and for both LSTM and Transformer architectures. Models predict based on unigram token frequencies early in training, before transitioning loosely to bigram probabilities, eventually converging on more nuanced predictions. These results shed light on the role of distributional learning mechanisms in children, while also providing insights for more human-like language acquisition in language models.
Bibliography:2022
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00444