Benchmarking Offensive and Abusive Language in Dutch Tweets

Tommaso Caselli; Hylke Van Der Veen

Benchmarking Offensive and Abusive Language in Dutch Tweets

Tommaso Caselli, Hylke Van Der Veen

Add to Favorites

The 7th Workshop on Online Abuse and Harms (WOAH) Long paper Paper

TLDR: We present an extensive evaluation of different fine-tuned models to detect instances of offensive and abusive language in Dutch across three benchmarks: a standard held-out test, a task- agnostic functional benchmark, and a dynamic test set. We also investigate the use of data cartography to identi

RocketChat
Abstract

You can open the #paper-ACL_26 channel in a separate window.

Abstract: We present an extensive evaluation of different fine-tuned models to detect instances of offensive and abusive language in Dutch across three benchmarks: a standard held-out test, a task- agnostic functional benchmark, and a dynamic test set. We also investigate the use of data cartography to identify high quality training data. Our results show a relatively good quality of the manually annotated data used to train the models while highlighting some critical weakness. We have also found a good portability of trained models along the same language phenomena. As for the data cartography, we have found a positive impact only on the functional benchmark and when selecting data per annotated dimension rather than using the entire training material.