Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei Li, Yu Qiao, Jingjing Xu
Findings: Machine Translation Findings Paper
Session 7: Machine Translation (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 12, 15:00-16:30 UTC / 15:00-16:30 GMT
Spotlight Session: Spotlight - Metropolitan Centre (Spotlight)
Conference Room: Metropolitan Centre
Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)
Global Time: July 10, 23:00-01:00 UTC / 23:00-01:00 GMT +1d
Keywords:
multilingual mt
TLDR:
Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions.
Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models.
In this paper, we revisit the classic multi-way...
You can open the
#paper-P1676
channel in a separate window.
Abstract:
Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions.
Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models.
In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data.
Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters.
The proposed training recipe brings a 28.2$\times$ speedup over the conventional multi-way training method.{code and data repo: {https://github.com/CONE-MT/Lego-MT.git}.}