The HW-TSC's Simultaneous Speech-to-Speech Translation System for IWSLT 2023 Evaluation

Hengchao Shang, Zhiqiang Rao, Zongyao Li, Zhanglin Wu, Jiaxin GUO, Minghan Wang, Daimeng Wei, Shaojun Li, Zhengzhe YU, Xiaoyu Chen, Lizhi Lei, Hao Yang

The 20th International Conference on Spoken Language Translation Long Paper

TLDR: In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Speech Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our solution is a cascaded incremental decoding system, consisting of an ASR
You can open the #paper-IWSLT_43 channel in a separate window.
Abstract: In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Speech Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our solution is a cascaded incremental decoding system, consisting of an ASR model, an MT model, and a TTS model. By adopting the strategies used in the Speech-to-Text track, we have managed to generate a more confident target text for each audio segment input, which can guide the next MT incremental decoding process. Additionally, we have integrated the TTS model to seamlessly reproduce audio files from the translation hypothesis. To enhance the effectiveness of our experiment, we have utilized a range of methods to reduce error conditions in the TTS input text and improve the smoothness of the TTS output audio.