How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

Aditya Yedetore; Tal Linzen; Robert Frank; R. Thomas McCoy

How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

Aditya Yedetore, Tal Linzen, Robert Frank, R. Thomas McCoy

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Linguistic Theories, Cognitive Modeling, and Psycholinguistics Main-poster Paper

Poster Session 4: Linguistic Theories, Cognitive Modeling, and Psycholinguistics (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 11, Poster Session 4 (15:00-16:30 UTC)

Keywords: cognitive modeling

TLDR: When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore th...

You can open the #paper-P5826 channel in a separate window.

Abstract: When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierarchical bias - on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We then evaluate what these models have learned about English yes/no questions, a phenomenon for which hierarchical structure is crucial. We find that, though they perform well at capturing the surface statistics of child-directed speech (as measured by perplexity), both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.