DIP: Dead code Insertion based Black-box Attack for Programming Language Model

CheolWon Na; YunSeok Choi; Jee-Hyong Lee

DIP: Dead code Insertion based Black-box Attack for Programming Language Model

CheolWon Na, YunSeok Choi, Jee-Hyong Lee

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Interpretability and Analysis of Models for NLP Main-poster Paper

Poster Session 4: Interpretability and Analysis of Models for NLP (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 11, Poster Session 4 (15:00-16:30 UTC)

Keywords: adversarial attacks/examples/training

TLDR: Automatic processing of source code, such as code clone detection and software vulnerability detection, is very helpful to software engineers. Large pre-trained Programming Language (PL) models (such as CodeBERT, GraphCodeBERT, CodeT5, etc.), show very powerful performance on these tasks. However,...

You can open the #paper-P3729 channel in a separate window.

Abstract: Automatic processing of source code, such as code clone detection and software vulnerability detection, is very helpful to software engineers. Large pre-trained Programming Language (PL) models (such as CodeBERT, GraphCodeBERT, CodeT5, etc.), show very powerful performance on these tasks. However, these PL models are vulnerable to adversarial examples that are generated with slight perturbation. Unlike natural language, an adversarial example of code must be semantic-preserving and compilable. Due to the requirements, it is hard to directly apply the existing attack methods for natural language models. In this paper, we propose DIP (Dead code Insertion based Black-box Attack for Programming Language Model), a high-performance and effective black-box attack method to generate adversarial examples using dead code insertion. We evaluate our proposed method on 9 victim downstream-task large code models. Our method outperforms the state-of-the-art black-box attack in both attack efficiency and attack quality, while generated adversarial examples are compiled preserving semantic functionality.