Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning
Hao Zheng, Mirella Lapata
Findings: Machine Learning for NLP Findings Paper
Session 7: Machine Learning for NLP (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 12, Session 7 (15:00-16:30 UTC)
Keywords:
generalization
TLDR:
Compositional generalization is a basic mechanism in human language
learning, which current neural networks struggle with. A recently
proposed Disentangled sequence-to-sequence model
(Dangle) shows promising generalization capability by learning
specialized encodings for each decoding step....
You can open the
#paper-P1939
channel in a separate window.
Abstract:
Compositional generalization is a basic mechanism in human language
learning, which current neural networks struggle with. A recently
proposed Disentangled sequence-to-sequence model
(Dangle) shows promising generalization capability by learning
specialized encodings for each decoding step. We introduce two key
modifications to this model which encourage more disentangled
representations and improve its compute and memory efficiency,
allowing us to tackle compositional generalization in a more
realistic setting. Specifically, instead of adaptively re-encoding
source keys and values at each time step, we disentangle their
representations and only re-encode keys periodically, at some
interval. Our new architecture leads to better generalization
performance across existing tasks and datasets, and a new machine
translation benchmark which we create by
detecting naturally occurring compositional patterns in
relation to a training set. We show this methodology better emulates real-world
requirements than artificial challenges.