[SRW] Assessing Chain-of-Thought Reasoning against Lexical Negation: A Case Study on Syllogism

Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Hiroaki Funayama, Goro Kobayashi

Student Research Workshop Srw Paper

Session 4: Student Research Workshop (Oral)
Conference Room: Pier 2&3
Conference Time: July 11, 11:00-12:00 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:00 UTC)
TLDR: Chain-of-thought (CoT) prompting, i.e., step-by-step reasoning instruction, enhances the performance of large language models (LLMs) in various tasks. Nevertheless, whether this gain suggests that LLMs have robust reasoning abilities remains unclear. In this study, we inspect the LLMs' step-by-step ...
You can open the #paper-S52 channel in a separate window.
Abstract: Chain-of-thought (CoT) prompting, i.e., step-by-step reasoning instruction, enhances the performance of large language models (LLMs) in various tasks. Nevertheless, whether this gain suggests that LLMs have robust reasoning abilities remains unclear. In this study, we inspect the LLMs' step-by-step reasoning ability under controlled settings, with a particular focus on negation. Our results indicate that all tested LLMs were not robust against lexical negation (e.g., plausible -> implausible) when performing the CoT reasoning. Furthermore, our experiments with varying levels revealed that different LLMs exhibit different difficulties and output biases against lexical negation.