Multi-Document Summarization with Centroid-Based Pretraining
Ratish Surendran Puduppully, Parag Jain, Nancy Chen, Mark Steedman
Main: Summarization Main-poster Paper
Poster Session 2: Summarization (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 10, 14:00-15:30 (EDT) (America/Toronto)
Global Time: July 10, Poster Session 2 (18:00-19:30 UTC)
Keywords:
multi-document summarization
TLDR:
In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each documen...
You can open the
#paper-P1820
channel in a separate window.
Abstract:
In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each document cluster as a proxy for its summary. Our objective thus does not require human written summaries and can be utilized for pretraining on a dataset consisting solely of document sets. Through zero-shot, few-shot, and fully supervised experiments on multiple MDS datasets, we show that our model \textit{Centrum} is better or comparable to a state-of-the-art model. We make the pretrained and fine-tuned models freely available to the research community{{https://github.com/ratishsp/centrum}}.