We propose CHRT (Control Hidden
Representation Transformation) – a con-
trolled language generation framework that
steers large language models to generate
text pertaining to certain attributes (such as
toxicity). CHRT gains attribute control by
modifying the hidden representation of the
base model through learned transformations.
We employ a contrastive-learning framework
to learn these transformations that can be
combined to gain multi-attribute control. The
effectiveness of CHRT is experimentally
shown by comparing it with seven baselines
over three attributes. CHRT outperforms all the
baselines in the task of detoxification, positive
sentiment steering, and text simplification
while minimizing the loss in linguistic qualities.
Further, our approach has the lowest inference
latency of only 0.01 seconds more than the
base model, making it the most suitable for
high-performance production environments.
We open-source our code and release two novel
datasets to further propel controlled language