On-the-fly Cross-lingual Masking for Multilingual Pre-training

Xi Ai, Bin Fang

Main: Multilingualism and Cross-Lingual NLP Main-poster Paper

Session 4: Multilingualism and Cross-Lingual NLP (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:30 UTC)
Keywords: multilingual pre-training
TLDR: In multilingual pre-training with the objective of MLM (masked language modeling) on multiple monolingual corpora, multilingual models only learn cross-linguality implicitly from isomorphic spaces formed by overlapping different language spaces due to the lack of explicit cross-lingual forward pass....
You can open the #paper-P5607 channel in a separate window.
Abstract: In multilingual pre-training with the objective of MLM (masked language modeling) on multiple monolingual corpora, multilingual models only learn cross-linguality implicitly from isomorphic spaces formed by overlapping different language spaces due to the lack of explicit cross-lingual forward pass. In this work, we present CLPM (Cross-lingual Prototype Masking), a dynamic and token-wise masking scheme, for multilingual pre-training, using a special token $[\mathcal{C}]_{x}$ to replace a random token $x$ in the input sentence. $[\mathcal{C}]_{x}$ is a cross-lingual prototype for $x$ and then forms an explicit cross-lingual forward pass. We instantiate CLPM for the multilingual pre-training phase of UNMT (unsupervised neural machine translation), and experiments show that CLPM can consistently improve the performance of UNMT models on $\{De, Ro, Ne \} \leftrightarrow En$. Beyond UNMT or bilingual tasks, we show that CLPM can consistently improve the performance of multilingual models on cross-lingual classification.