Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Xianchao Zhu, Yue Zhang
Main: Summarization Main-poster Paper
Session 4: Summarization (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:30 UTC)
Keywords:
multilingual summarisation
Languages:
mandarin, french
TLDR:
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes.
To address this issue, we propose ConvSumX, a cross-lingual...
You can open the
#paper-P85
channel in a separate window.
Abstract:
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes.
To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context.
ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions.
We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text.
Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process.
Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation.
Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.