Novalign: Neural Cross-Lingual Sentence Alignment for Novels

Francesco Molfese, Andrei Stefan Bejgu, Simone Tedeschi, Roberto Navigli

The 5th Workshop on Narrative Understanding N/a Paper

TLDR: Sentence alignment -- establishing links between corresponding sentences in two relateddocuments -- is as important in paraphrase generation as it is in machine translation. Despite its applicability, its benefits are often overlooked in the context of narrative understanding. For instance, it can b
You can open the #paper-wnu2023_19 channel in a separate window.
Abstract: Sentence alignment -- establishing links between corresponding sentences in two relateddocuments -- is as important in paraphrase generation as it is in machine translation. Despite its applicability, its benefits are often overlooked in the context of narrative understanding. For instance, it can be leveraged in cross-lingual story analysis and cultural analytics. This includes identifying similarities and differences between narratives across languages, or understanding narrative structures in a more comprehensive way. To bridge this gap, we introduce a novel methodology for sentence alignment designed specifically for novels. In particular, we propose Novalign, an end-to-end, fully-neural architecture that maps source and target sentences based on their contextualized sentence embeddings. We extensively evaluate Novalign on a new, multilingual dataset derived from the Opus project consisting of 20 language pairs, and demonstrate that our model achieves state-of-the-art performance. To ensure reproducibility, we release our code and model checkpoints at omitted.link.