Multi-Document Summarization with Centroid-Based Pretraining

Ratish Surendran Puduppully; Parag Jain; Nancy Chen; Mark Steedman

Multi-Document Summarization with Centroid-Based Pretraining

Ratish Surendran Puduppully, Parag Jain, Nancy Chen, Mark Steedman

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Summarization Main-poster Paper

Poster Session 2: Summarization (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 10, 14:00-15:30 (EDT) (America/Toronto)

Global Time: July 10, Poster Session 2 (18:00-19:30 UTC)

Keywords: multi-document summarization

TLDR: In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each documen...

You can open the #paper-P1820 channel in a separate window.

Abstract: In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each document cluster as a proxy for its summary. Our objective thus does not require human written summaries and can be utilized for pretraining on a dataset consisting solely of document sets. Through zero-shot, few-shot, and fully supervised experiments on multiple MDS datasets, we show that our model \textit{Centrum} is better or comparable to a state-of-the-art model. We make the pretrained and fine-tuned models freely available to the research community{{https://github.com/ratishsp/centrum}}.