[SRW] Multimodal Neural Machine Translation Using Synthetic Images Transformed by Latent Diffusion Model

Ryoya Yuasa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya, Tsuneo Kato

Student Research Workshop Srw Paper

Session 3: Student Research Workshop (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 11, 09:00-10:30 (EDT) (America/Toronto)
Global Time: July 11, Session 3 (13:00-14:30 UTC)
TLDR: This study proposes a new multimodal neural machine translation (MNMT) model using synthetic images transformed by a latent diffusion model. MNMT translates a source language sentence based on its related image, but the image usually contains noisy information that are not relevant to the source lan...
You can open the #paper-S28 channel in a separate window.
Abstract: This study proposes a new multimodal neural machine translation (MNMT) model using synthetic images transformed by a latent diffusion model. MNMT translates a source language sentence based on its related image, but the image usually contains noisy information that are not relevant to the source language sentence.Our proposed method first generates a synthetic image corresponding to the content of the source language sentence by using a latent diffusion model and then performs translation based on the synthetic image. The experiments on the English-German translation tasks using the Multi30k dataset demonstrate the effectiveness of the proposed method.