Interpreting Positional Information in Perspective of Word Order
Zhang Xilong, Liu Ruochen, Liu Jin, Liang Xuefeng
Main: Interpretability and Analysis of Models for NLP Main-poster Paper
Session 1: Interpretability and Analysis of Models for NLP (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 10, Session 1 (15:00-16:30 UTC)
Keywords:
probing
TLDR:
The attention mechanism is a powerful and effective method utilized in natural language processing. However, it has been observed that this method is insensitive to positional information. Although several studies have attempted to improve positional encoding and investigate the influence of word or...
You can open the
#paper-P3410
channel in a separate window.
Abstract:
The attention mechanism is a powerful and effective method utilized in natural language processing. However, it has been observed that this method is insensitive to positional information. Although several studies have attempted to improve positional encoding and investigate the influence of word order perturbation, it remains unclear how positional encoding impacts NLP models from the perspective of word order. In this paper, we aim to shed light on this problem by analyzing the working mechanism of the attention module and investigating the root cause of its inability to encode positional information. Our hypothesis is that the insensitivity can be attributed to the weight sum operation utilized in the attention module. To verify this hypothesis, we propose a novel weight concatenation operation and evaluate its efficacy in neural machine translation tasks. Our enhanced experimental results not only reveal that the proposed operation can effectively encode positional information but also confirm our hypothesis.