Instruction Induction: From Few Examples to Natural Language Task Descriptions

Or Honovich; Uri Shaham; Samuel R. Bowman; Omer Levy

Instruction Induction: From Few Examples to Natural Language Task Descriptions

Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Interpretability and Analysis of Models for NLP Main-poster Paper

Session 7: Interpretability and Analysis of Models for NLP (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 12, Session 7 (15:00-16:30 UTC)

Keywords: free-text/natural language explanations

TLDR: Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction...

You can open the #paper-P5749 channel in a separate window.

Abstract: Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge, compile a dataset consisting of 24 tasks, and define a novel evaluation metric based on executing the generated instruction. We discover that, to a large extent, the ability to generate instructions does indeed emerge when using a model that is both large enough and aligned to follow instructions; InstructGPT achieves 65.7\% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8\% of human performance. This surprising result suggests that instruction induction might be a viable learning paradigm in and of itself, where instead of fitting a set of latent continuous parameters to the data, one searches for the best description in the natural language hypothesis space.