Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Shadi Iskander, Kira Radinsky, Yonatan Belinkov

The Third Workshop on Trustworthy Natural Language Processing Paper

TLDR: This paper proposes a novel approach, called Iterative Gradient-Based Projection (IGBP), for removing non-linear encoded demographic information from neural representations. The method is evaluated on gender and race attributes using intrinsic and extrinsic metrics. The comprehensive results demonst
You can open the #paper-TrustNLP_49 channel in a separate window.
Abstract: This paper proposes a novel approach, called Iterative Gradient-Based Projection (IGBP), for removing non-linear encoded demographic information from neural representations. The method is evaluated on gender and race attributes using intrinsic and extrinsic metrics. The comprehensive results demonstrate the effectiveness of the proposed method.The paper got accepted to the Findings of ACL, reviews are included in the appendix.The parts of the paper which have been revised are colored in blue.