MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles

Nicolas Devatine, Philippe Muller, Chlo\'{e} Braud

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 3: detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup Paper

TLDR: This paper describes our approach to Subtask 1 "News Genre Categorization" of SemEval-2023 Task 3 "Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup", which aims to determine whether a given news article is an opinion piece, an objective repor
You can open the #paper-SemEval_17 channel in a separate window.
Abstract: This paper describes our approach to Subtask 1 "News Genre Categorization" of SemEval-2023 Task 3 "Detecting the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual Setup", which aims to determine whether a given news article is an opinion piece, an objective report, or satirical. We fine-tuned the domain-specific language model POLITICS, which was pre-trained on a large-scale dataset of more than 3.6M English political news articles following ideology-driven pre-training objectives. In order to use it in the multilingual setup of the task, we added as a pre-processing step the translation of all documents into English. Our system ranked among the top systems overall in most language, and ranked 1st on the English dataset.