Text-to-SQL Error Correction with Language Models of Code

Ziru Chen; Shijie Chen; Michael White; Raymond Mooney; Ali Payani; Jayanth Srinivasa; Yu Su; Huan Sun

Text-to-SQL Error Correction with Language Models of Code

Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: NLP Applications Main-poster Paper

Poster Session 5: NLP Applications (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)

Global Time: July 11, Poster Session 5 (20:15-21:45 UTC)

Keywords: code generation and understanding

TLDR: Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose b...

You can open the #paper-P4662 channel in a separate window.

Abstract: Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines.