Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification

Renliang Sun; Wei Xu; Xiaojun Wan

Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification

Renliang Sun, Wei Xu, Xiaojun Wan

📝 Paper

Anthology

Underline 📺 Watch Video on Underline Add to Favorites

Findings: Generation Findings Paper

Session 7: Generation (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 12, Session 7 (15:00-16:30 UTC)

Keywords: domain adaptation, text-to-text generation

TLDR: Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the p...

You can open the #paper-P456 channel in a separate window.

Abstract: Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).