On-the-fly Cross-lingual Masking for Multilingual Pre-training
Xi Ai, Bin Fang
Main: Multilingualism and Cross-Lingual NLP Main-poster Paper
Session 4: Multilingualism and Cross-Lingual NLP (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:30 UTC)
Keywords:
multilingual pre-training
TLDR:
In multilingual pre-training with the objective of MLM (masked language modeling) on multiple monolingual corpora, multilingual models only learn cross-linguality implicitly from isomorphic spaces formed by overlapping different language spaces due to the lack of explicit cross-lingual forward pass....
You can open the
#paper-P5607
channel in a separate window.
Abstract:
In multilingual pre-training with the objective of MLM (masked language modeling) on multiple monolingual corpora, multilingual models only learn cross-linguality implicitly from isomorphic spaces formed by overlapping different language spaces due to the lack of explicit cross-lingual forward pass. In this work, we present CLPM (Cross-lingual Prototype Masking), a dynamic and token-wise masking scheme, for multilingual pre-training, using a special token $[\mathcal{C}]_{x}$ to replace a random token $x$ in the input sentence. $[\mathcal{C}]_{x}$ is a cross-lingual prototype for $x$ and then forms an explicit cross-lingual forward pass. We instantiate CLPM for the multilingual pre-training phase of UNMT (unsupervised neural machine translation), and experiments show that CLPM can consistently improve the performance of UNMT models on $\{De, Ro, Ne \} \leftrightarrow En$. Beyond UNMT or bilingual tasks, we show that CLPM can consistently improve the performance of multilingual models on cross-lingual classification.