Won't Get Fooled Again: Answering Questions with False Premises

Shengding Hu; Yifan Luo; Huadong Wang; Xingyi Cheng; Zhiyuan Liu; Maosong Sun

Won't Get Fooled Again: Answering Questions with False Premises

Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Question Answering Main-oral Paper

Session 1: Question Answering (Oral)

Conference Room: Metropolitan West

Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 10, Session 1 (15:00-16:30 UTC)

Keywords: generalization

TLDR: Pre-trained language models (PLMs) have shown unprecedented potential in various fields, especially as the backbones for question-answering (QA) systems. However, they tend to be easily deceived by tricky questions such as ``How many eyes does the sun have?''. Such frailties of PLMs often allude to ...

You can open the #paper-P3014 channel in a separate window.

Abstract: Pre-trained language models (PLMs) have shown unprecedented potential in various fields, especially as the backbones for question-answering (QA) systems. However, they tend to be easily deceived by tricky questions such as ``How many eyes does the sun have?''. Such frailties of PLMs often allude to the lack of knowledge within them. In this paper, we find that the PLMs already possess the knowledge required to rebut such questions, and the key is how to activate the knowledge. To systematize this observation, we investigate the PLMs' responses to one kind of tricky questions, i.e., the false premises questions (FPQs). We annotate a FalseQA dataset containing 2365 human-written FPQs, with the corresponding explanations for the false premises and the revised true premise questions. Using FalseQA, we discover that PLMs are capable of discriminating FPQs by fine-tuning on moderate numbers (e.g., 256) of examples. PLMs also generate reasonable explanations for the false premise, which serve as rebuttals. Further replaying a few general questions during training allows PLMs to excel on FPQs and general questions simultaneously. Our work suggests that once the rebuttal ability is stimulated, knowledge inside the PLMs can be effectively utilized to handle FPQs, which incentivizes the research on PLM-based QA systems. The FalseQA dataset and code are available at https://github.com/thunlp/FalseQA .