Parameter-efficient Weight Ensembling Facilitates Task-level Knowledge Transfer

Xingtai Lv; Ning Ding; Yujia Qin; Zhiyuan Liu; Maosong Sun

Parameter-efficient Weight Ensembling Facilitates Task-level Knowledge Transfer

Xingtai Lv, Ning Ding, Yujia Qin, Zhiyuan Liu, Maosong Sun

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Large Language Models Main-poster Paper

Session 7: Large Language Models (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 12, Session 7 (15:00-16:30 UTC)

Keywords: fine-tuning

TLDR: Recent studies show that large-scale pre-trained language models could be efficaciously adapted to particular tasks in a parameter-efficient manner. The trained lightweight set of parameters, such as adapters, can be easily stored and shared as a capability equipped with the corresponding models. Ow...

You can open the #paper-P2877 channel in a separate window.

Abstract: Recent studies show that large-scale pre-trained language models could be efficaciously adapted to particular tasks in a parameter-efficient manner. The trained lightweight set of parameters, such as adapters, can be easily stored and shared as a capability equipped with the corresponding models. Owning many lightweight parameters, we focus on transferring them between tasks to acquire an improvement in performance of new tasks, the key point of which is to obtain the similarity between tasks. In this paper, we explore 5 parameter-efficient weight ensembling methods to achieve such transferability and verify the effectiveness of them. These methods extract the information of datasets and trained lightweight parameters from different perspectives to obtain the similarity between tasks, and weight the existing lightweight parameters according to the comparability to acquire a suitable module for the initialization of new tasks. We apply them to three parameter-efficient tuning methods and test them on a wide set of downstream tasks. Experimental results show that our methods show an improvement of 5\%\textasciitilde8\% over baselines and could largely facilitate task-level knowledge transfer.