TART: Improved Few-shot Text Classification Using Task-Adaptive Reference Transformation

Shuo Lei; Xuchao Zhang; Jianfeng He; Fanglan Chen; Chang-Tien Lu

TART: Improved Few-shot Text Classification Using Task-Adaptive Reference Transformation

Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Chang-Tien Lu

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Machine Learning for NLP Main-poster Paper

Session 1: Machine Learning for NLP (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 10, Session 1 (15:00-16:30 UTC)

Keywords: few-shot learning, meta learning

TLDR: Meta-learning has emerged as a trending technique to tackle few-shot text classiﬁcation and achieve state-of-the-art performance. However, the performance of existing approaches heavily depends on the inter-class variance of the support set. As a result, it can perform well on tasks when the semant...

You can open the #paper-P5657 channel in a separate window.

Abstract: Meta-learning has emerged as a trending technique to tackle few-shot text classiﬁcation and achieve state-of-the-art performance. However, the performance of existing approaches heavily depends on the inter-class variance of the support set. As a result, it can perform well on tasks when the semantics of sampled classes are distinct while failing to differentiate classes with similar semantics. In this paper, we propose a novel Task-Adaptive Reference Transformation (TART) network, aiming to enhance the generalization by transforming the class prototypes to per-class fixed reference points in task-adaptive metric spaces. To further maximize divergence between transformed prototypes in task-adaptive metric spaces, TART introduces a discriminative reference regularization among transformed prototypes. Extensive experiments are conducted on four benchmark datasets and our method demonstrates clear superiority over the state-of-the-art models in all the datasets. In particular, our model surpasses the state-of-the-art method by 7.4\% and 5.4\% in 1-shot and 5-shot classification on the 20 Newsgroups dataset, respectively.