BERTastic at SemEval-2023 Task 3: Fine-Tuning Pretrained Multilingual Transformers Does Order Matter?

Tarek Mahmoud, Preslav Nakov

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 3: detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup Paper

TLDR: The na\textbackslash{}"\{i\}ve approach for fine-tuning pretrained deep learning models on downstream tasks involves feeding them mini-batches of randomly sampled data. In this paper, we propose a more elaborate method for fine-tuning Pretrained Multilingual Transformers (PMTs) on multilingual data
You can open the #paper-SemEval_9 channel in a separate window.
Abstract: The na\textbackslash{}"\{i\}ve approach for fine-tuning pretrained deep learning models on downstream tasks involves feeding them mini-batches of randomly sampled data. In this paper, we propose a more elaborate method for fine-tuning Pretrained Multilingual Transformers (PMTs) on multilingual data. Inspired by the success of curriculum learning approaches, we investigate the significance of fine-tuning PMTs on multilingual data in a sequential fashion language by language. Unlike the curriculum learning paradigm where the model is presented with increasingly complex examples, we do not adopt a notion of ``easy'' and ``hard'' samples. Instead, our experiments draw insight from psychological findings on how the human brain processes new information and the persistence of newly learned concepts. We perform our experiments on a challenging news-framing dataset that contains texts in six languages. Our proposed method outperforms the na\textbackslash{}"\{i\}ve approach by achieving improvements of \$\textbackslash{}mathbf\{2.57\textbackslash{}\%\}\$ in terms of F1 score. Even when we supplement the na\textbackslash{}"\{i\}ve approach with recency fine-tuning, we still achieve an improvement of \$\textbackslash{}mathbf\{1.34\textbackslash{}\%\}\$ with a \$\textbackslash{}mathbf\{3.63\textbackslash{}\%\}\$ convergence speed-up. Moreover, we are the first to observe an interesting pattern in which deep learning models exhibit a human-like \textbackslash{}textit\{primacy-recency effect\}.