Evaluating Paraphrastic Robustness in Textual Entailment Models

Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme, Adam Poliak

Main: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas Main-poster Paper

Poster Session 1: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 10, Poster Session 1 (15:00-16:30 UTC)
Keywords: textual entailment
TLDR: We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluati...
You can open the #paper-P4522 channel in a separate window.
Abstract: We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement.