Low-Rank Updates of pre-trained Weights for Multi-Task Learning

Alexandre Daniel AUDIBERT, Massih R Amini, Konstantin Usevich, Marianne Clausel

Findings: Machine Learning for NLP Findings Paper

Session 4: Machine Learning for NLP (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 11, Session 4 (15:00-16:30 UTC)
Spotlight Session: Spotlight - Metropolitan Centre (Spotlight)
Conference Room: Metropolitan Centre
Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)
Global Time: July 10, Spotlight Session (23:00-01:00 UTC)
Keywords: multi-task learning
TLDR: Multi-Task Learning used with pre-trained models has been quite popular in the field of Natural Language Processing in recent years. This framework remains still challenging due to the complexity of the tasks and the challenges associated with fine-tuning large pre-trained models. In this paper, we ...
You can open the #paper-P2048 channel in a separate window.
Abstract: Multi-Task Learning used with pre-trained models has been quite popular in the field of Natural Language Processing in recent years. This framework remains still challenging due to the complexity of the tasks and the challenges associated with fine-tuning large pre-trained models. In this paper, we propose a new approach for Multi-task learning which is based on stacking the weights of Neural Networks as a tensor. We show that low-rank updates in the canonical polyadic tensor decomposition of this tensor of weights lead to a simple, yet efficient algorithm, which without loss of performance allows to reduce considerably the model parameters. We investigate the interactions between tasks inside the model as well as the inclusion of sparsity to find the best tensor rank and to increase the compression rate. Our strategy is consistent with recent efforts that attempt to use constraints to fine-tune some model components. More precisely, we achieve equivalent performance as the state-of-the-art on the General Language Understanding Evaluation benchmark by training only 0.3 of the parameters per task while not modifying the baseline weights.