PromptAttack: Probing Dialogue State Trackers with Adversarial Prompts

Xiangjue Dong; Yun He; Ziwei Zhu; James Caverlee

PromptAttack: Probing Dialogue State Trackers with Adversarial Prompts

Xiangjue Dong, Yun He, Ziwei Zhu, James Caverlee

📝 Paper

Anthology

Underline 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Findings: Interpretability and Analysis of Models for NLP Findings Paper

Session 7: Interpretability and Analysis of Models for NLP (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 12, Session 7 (15:00-16:30 UTC)

Spotlight Session: Spotlight - Metropolitan West (Spotlight)

Conference Room: Metropolitan West

Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)

Global Time: July 10, Spotlight Session (23:00-01:00 UTC)

Keywords: adversarial attacks/examples/training

TLDR: A key component of modern conversational systems is the Dialogue State Tracker (or DST), which models a user's goals and needs. Toward building more robust and reliable DSTs, we introduce a prompt-based learning approach to automatically generate effective adversarial examples to probe DST models. T...

You can open the #paper-P5841 channel in a separate window.

Abstract: A key component of modern conversational systems is the Dialogue State Tracker (or DST), which models a user's goals and needs. Toward building more robust and reliable DSTs, we introduce a prompt-based learning approach to automatically generate effective adversarial examples to probe DST models. Two key characteristics of this approach are: (i) it only needs the output of the DST with no need for model parameters, and (ii) it can learn to generate natural language utterances that can target any DST. Through experiments over state-of-the-art DSTs, the proposed framework leads to the greatest reduction in accuracy and the best attack success rate while maintaining good fluency and a low perturbation ratio. We also show how much the generated adversarial examples can bolster a DST through adversarial training. These results indicate the strength of prompt-based attacks on DSTs and leave open avenues for continued refinement.