XDailyDialog: A Multilingual Parallel Dialogue Corpus
Zeming Liu, Ping Nie, Jie Cai, Haifeng Wang, Zheng-Yu Niu, PENG ZHANG, Mrinmaya Sachan, Kaiping Peng
Main: Dialogue and Interactive Systems Main-poster Paper
Session 4: Dialogue and Interactive Systems (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:30 UTC)
Keywords:
multilingual / low resource
TLDR:
High-quality datasets are significant to the development of dialogue models.
However, most existing datasets for open-domain dialogue modeling are limited to a single language.
The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual trans...
You can open the
#paper-P3126
channel in a separate window.
Abstract:
High-quality datasets are significant to the development of dialogue models.
However, most existing datasets for open-domain dialogue modeling are limited to a single language.
The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual transfer learning, but also hinders the development of robust open-domain dialog systems that can be deployed in other parts of the world.
In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialog.
XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total).
We then propose a dialog generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue.
Experiment results show the effectiveness of this framework.
We will make XDailyDialog and kNN-Chat publicly available soon.