Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Shadi Iskander, Kira Radinsky, Yonatan Belinkov
The Third Workshop on Trustworthy Natural Language Processing Paper
TLDR:
This paper proposes a novel approach, called Iterative Gradient-Based Projection (IGBP), for removing non-linear encoded demographic information from neural representations. The method is evaluated on gender and race attributes using intrinsic and extrinsic metrics. The comprehensive results demonst
You can open the
#paper-TrustNLP_49
channel in a separate window.
Abstract:
This paper proposes a novel approach, called Iterative Gradient-Based Projection (IGBP), for removing non-linear encoded demographic information from neural representations. The method is evaluated on gender and race attributes using intrinsic and extrinsic metrics. The comprehensive results demonstrate the effectiveness of the proposed method.The paper got accepted to the Findings of ACL, reviews are included in the appendix.The parts of the paper which have been revised are colored in blue.