[Industry] Multi-doc Hybrid Summarization via Salient Representation Learning

Min Xiao

Industry: Industry Industry Paper

Session 5: Industry (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)
Global Time: July 11, Session 5 (20:15-21:45 UTC)
TLDR: Multi-document summarization is gaining more and more attention recently and serves as an invaluable tool to obtain key facts among a large information pool. In this paper, we proposed a multi-document hybrid summarization approach, which simultaneously generates a human-readable summary and extract...
You can open the #paper-I102 channel in a separate window.
Abstract: Multi-document summarization is gaining more and more attention recently and serves as an invaluable tool to obtain key facts among a large information pool. In this paper, we proposed a multi-document hybrid summarization approach, which simultaneously generates a human-readable summary and extracts corresponding key evidences based on multi-doc inputs. To fulfill that purpose, we crafted a salient representation learning method to induce latent salient features, which are effective for joint evidence extraction and summary generation. In order to train this model, we conducted multi-task learning to optimize a composited loss, constructed over extractive and abstractive sub-components in a hierarchical way. We implemented the system based on a ubiquiotously adopted transformer architecture and conducted experimental studies on multiple datasets across two domains, achieving superior performance over the baselines.