Uppsala University at SemEval-2023 Task12: Zero-shot Sentiment Classification for Nigerian Pidgin Tweets

Annika Kniele, Meriem Beloucif

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 12: afrisenti-semeval: sentiment analysis for low-resource african languages using twitter dataset Paper

TLDR: While sentiment classification has been considered a practically solved task for high-resource languages such as English, the scarcity of data for many languages still makes it a challenging task. The AfriSenti-SemEval shared task aims to classify sentiment on Twitter data for 14 low-resource Africa
You can open the #paper-SemEval_223 channel in a separate window.
Abstract: While sentiment classification has been considered a practically solved task for high-resource languages such as English, the scarcity of data for many languages still makes it a challenging task. The AfriSenti-SemEval shared task aims to classify sentiment on Twitter data for 14 low-resource African languages. In our participation, we focus on Nigerian Pidgin as the target language. We have investigated the effect of English monolingual and multilingual pre-trained models on the sentiment classification task for Nigerian Pidgin. Our setup includes zero-shot models (using English, Igbo and Hausa data) and a Nigerian Pidgin fine-tuned model. Our results show that English fine-tuned models perform slightly better than models fine-tuned on other Nigerian languages, which could be explained by the lexical and structural closeness between Nigerian Pidgin and English. The best results were reported on the monolingual Nigerian Pidgin data. The model pre-trained on English and fine-tuned on Nigerian Pidgin was submitted to Task A Track 4 of the AfriSenti-SemEval Shared Task 12, and scored 25 out of 32 in the ranking.