[Demo] Lingxi: A Diversity-aware Chinese Modern Poetry Generation System

Xinran Zhang, Maosong Sun, Jiafeng Liu, Xiaobing Li

Demo: Linguistic Diversity (demo) Demo Paper

Demo Session 3: Linguistic Diversity (demo) (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 11, 09:00-10:30 (EDT) (America/Toronto)
Global Time: July 11, Demo Session 3 (13:00-14:30 UTC)
TLDR: Chinese modern poetry generation has been a challenging task. One issue is the Chinese word segmentation (CWS) which is critical to comprehend the Chinese language but was not always considered in common tokenization methods. Another is the decoding (sampling) method which may induce repetition and ...
You can open the #paper-D24 channel in a separate window.
Abstract: Chinese modern poetry generation has been a challenging task. One issue is the Chinese word segmentation (CWS) which is critical to comprehend the Chinese language but was not always considered in common tokenization methods. Another is the decoding (sampling) method which may induce repetition and boredom and severely lower the diversity of the generated poetry. To address these issues, we present Lingxi, a diversity-aware Chinese modern poetry generation system. For the CWS issue, we propose a novel framework that incorporates CWS in the tokenization process. The proposed method can achieve a high vocabulary coverage rate with a reasonable vocabulary size. For the decoding method and the diversity issue, we propose a novel sampling algorithm that flattens the high likelihood part of the predicted distribution of the language model to emphasize the comparatively low-likelihood words and increase the diversity of generated poetry. Empirical results show that even when the top 60% of cumulative probability mass of the predicted distribution is flattened, our method achieves comparable or even better performance than baseline sampling methods. Our system is available at http://lingxi.website.