Adversarial Named-Entity Recognition with Word Attributions and Disentanglement

Xiaomeng Jin, Bhanukiran Vinzamuri, Sriram Venkatapathy, Heng Ji, Pradeep Natarajan

The Third Workshop on Trustworthy Natural Language Processing Paper

TLDR: The problem of making Named Entity Recognition (NER) models robust to adversarial attacks has received widespread attention recently (Simoncini and Spanakis, 2021; Lin et al., 2021). The existing techniques for robustfying the NER models rely on exhaustive perturbation of the input training data to
You can open the #paper-TrustNLP_55 channel in a separate window.
Abstract: The problem of making Named Entity Recognition (NER) models robust to adversarial attacks has received widespread attention recently (Simoncini and Spanakis, 2021; Lin et al., 2021). The existing techniques for robustfying the NER models rely on exhaustive perturbation of the input training data to generate adversarial examples, often resulting in adversarial examples that are not semantically equivalent to the original. In this paper, we employ word attributions guided perturbations that generate adversarial examples with a comparable attack rates but at a lower modification rate. Our approach also uses disentanglement of entity and non-entity word representations as a mechanism to generate diverse and unbiased adversarial examples. Adversarial training results based on our method improves the F1 score over originally trained NER model by 8\% and 18\% on CoNLL-2003 and Ontonotes 5.0 datasets respectively.