Soft Language Clustering for Multilingual Model Pre-training

Jiali Zeng; Yufan Jiang; Yongjing Yin; Yi Jing; Fandong Meng; Binghuai Lin; Yunbo Cao; Jie Zhou

Soft Language Clustering for Multilingual Model Pre-training

Jiali Zeng, Yufan Jiang, Yongjing Yin, Yi Jing, Fandong Meng, Binghuai Lin, Yunbo Cao, Jie Zhou

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Multilingualism and Cross-Lingual NLP Main-poster Paper

Poster Session 1: Multilingualism and Cross-Lingual NLP (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 10, Poster Session 1 (15:00-16:30 UTC)

Keywords: multilingual pre-training

TLDR: Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from the source language or when pre-training data is limited in size. In this paper, we propose XL...

You can open the #paper-P3329 channel in a separate window.

Abstract: Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from the source language or when pre-training data is limited in size. In this paper, we propose XLM-P, a method that contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our space-efficient and model-agnostic XLM-P approach enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME, which include text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.