Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages
Israel Abebe Azime, Sana Al-azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Awosan Oyinkansola
The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 12: afrisenti-semeval: sentiment analysis for low-resource african languages using twitter dataset Paper
TLDR:
Detecting harmful content on social media plat-forms is crucial in preventing the negative ef-fects these posts can have on social media users.This paper presents our methodology for tack-ling task 10 from SemEval23, which focuseson detecting and classifying online sexism insocial media posts. We co
You can open the
#paper-SemEval_200
channel in a separate window.
Abstract:
Detecting harmful content on social media plat-forms is crucial in preventing the negative ef-fects these posts can have on social media users.This paper presents our methodology for tack-ling task 10 from SemEval23, which focuseson detecting and classifying online sexism insocial media posts. We constructed our solu-tion using an ensemble of transformer-basedmodels (that have been fine-tuned; BERTweet,RoBERTa, and DeBERTa). To alleviate the var-ious issues caused by the class imbalance inthe dataset provided and improve the general-ization of our model, our framework employsdata augmentation and semi-supervised learn-ing. Specifically, we use back-translation fordata augmentation in two scenarios: augment-ing the underrepresented class and augment-ing all classes. In this study, we analyze theimpact of these different strategies on the sys-tem's overall performance and determine whichtechnique is the most effective. Extensive ex-periments demonstrate the efficacy of our ap-proach. For sub-task A, the system achievedan F1-score of 0.8613. The source code to re-produce the proposed solutions is available onGithub