DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset

Samuel Akrah, Ted Pedersen

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 12: afrisenti-semeval: sentiment analysis for low-resource african languages using twitter dataset Paper

TLDR: This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neu
You can open the #paper-SemEval_260 channel in a separate window.
Abstract: This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neutral. We evaluate a range of monolingual and multilingual pretrained models on the Twi language dataset, one among the 14 African languages included in the SemEval task. We introduce TwiBERT, a new pretrained model trained from scratch. We show that TwiBERT, along with mBERT, generally perform best when trained on the Twi dataset, achieving an F1 score of 64.29\% on the official evaluation test data, which ranks 14 out of 30 of the total submissions for Track 10. The TwiBERT model is released at https://huggingface.co/sakrah/TwiBERT