Deductive Additivity for Planning of Natural Language Proofs

Zayne Sprague; Kaj Bostrom; Swarat Chaudhuri; Greg Durrett

Deductive Additivity for Planning of Natural Language Proofs

Zayne Sprague, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett

Add to Favorites

1st Workshop on Natural Language Reasoning and Structured Explanations (@ACL 2023) Long Paper

TLDR: Current natural language systems designed for multi-step claim validation typically operate in two phases: retrieve a set of relevant premise statements using heuristics (planning), then generate novel conclusions from those statements using a large language model (deduction). The planning step ofte

RocketChat
Abstract

You can open the #paper-ACL_91 channel in a separate window.

Abstract: Current natural language systems designed for multi-step claim validation typically operate in two phases: retrieve a set of relevant premise statements using heuristics (planning), then generate novel conclusions from those statements using a large language model (deduction). The planning step often requires expensive Transformer operations and does not scale to arbitrary numbers of premise statements. In this paper, we investigate whether efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit a property we call deductive additivity: the sum of premise statement embeddings should be close to embeddings of conclusions based on those premises. We explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We study embedding models both intrinsically, evaluating whether the property of deductive additivity holds, and extrinsically, using them to assist planning in natural language proof generation. Lastly, we create a dataset, Single-Step Reasoning Contrast (SSRC), to further probe performance on various reasoning types. Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning.