Seals_Lab at SemEval-2023 Task 12:  Sentiment Analysis for Low-resource African Languages, Hausa and Igbo

Nilanjana Raychawdhary; Amit Das; Gerry Dozier; Cheryl D. Seals

Seals_Lab at SemEval-2023 Task 12: Sentiment Analysis for Low-resource African Languages, Hausa and Igbo

Nilanjana Raychawdhary, Amit Das, Gerry Dozier, Cheryl D. Seals

Add to Favorites

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 12: afrisenti-semeval: sentiment analysis for low-resource african languages using twitter dataset Paper

TLDR: One of the most extensively researched applications in natural language processing (NLP) is sentiment analysis. While the majority of the study focuses on high-resource languages (e.g., English), this research will focus on low-resource African languages namely Igbo and Hausa. The annotated tweets o

RocketChat
Abstract

You can open the #paper-SemEval_228 channel in a separate window.

Abstract: One of the most extensively researched applications in natural language processing (NLP) is sentiment analysis. While the majority of the study focuses on high-resource languages (e.g., English), this research will focus on low-resource African languages namely Igbo and Hausa. The annotated tweets of both languages have a significant number of code-mixed tweets. The curated datasets necessary to build complex AI applications are not available for the majority of African languages. To optimize the use of such datasets, research is needed to determine the viability of present NLP procedures as well as the development of novel techniques. This paper outlines our efforts to develop a sentiment analysis (for positive and negative as well as neutral) system for tweets from the Hausa, and Igbo languages. Sentiment analysis can computationally analyze and discover sentiments in a text or document. We worked on the first thorough compilation of AfriSenti-SemEval 2023 Shared Task 12 Twitter datasets that are human-annotated for the most widely spoken languages in Nigeria, such as Hausa and Igbo. Here we trained the modern pre-trained language model AfriBERTa large on the AfriSenti-SemEval Shared Task 12 Twitter dataset to create sentiment classification. In particular, the results demonstrate that our model trained on AfriSenti-SemEval Shared Task 12 datasets and produced with an F1 score of 80.85\% for Hausa and 80.82\% for Igbo languages on the sentiment analysis test. In AfriSenti-SemEval 2023 shared task 12 (Task A), we consistently ranked top 10 by achieving a mean F1 score of more than 80\% for both the Hausa and Igbo languages.