Improving Mathematics Tutoring With A Code Scratchpad

Shriyash Upadhyay, Etan Ginsberg, Chris Callison-Burch

18th Workshop on Innovative Use of NLP for Building Educational Applications Paper

TLDR: Large language models can solve reasoning tasks (like math problems) more effectively when they are allowed to generate rationales. However, a good tutoring system should not just generate solutions, but should also generate explanations and should be able to correct and guide students. We show that
You can open the #paper-BEA_4 channel in a separate window.
Abstract: Large language models can solve reasoning tasks (like math problems) more effectively when they are allowed to generate rationales. However, a good tutoring system should not just generate solutions, but should also generate explanations and should be able to correct and guide students. We show that providing a code scratchpad improves performance on each tutoring step with a gradeschool mathematics dataset. On these tutoring tasks, GPT-3 models provided with a code scratchpad significantly outperform those given only a language scratchpad (77.7\% vs 48.7\% cumulative accuracy).