AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Asaad Alghamdi; Xinyu Duan; Wei Jiang; Zhenhai Wang; Yimeng Wu; Qingrong Xia; Zhefeng Wang; Yi ZHENG; Mehdi Rezagholizadeh; baoxing Huai; Peilun Cheng; Abbas Ghaddar

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi ZHENG, Mehdi Rezagholizadeh, baoxing Huai, Peilun Cheng, Abbas Ghaddar

📝 Paper

Anthology

Underline 📺 Watch Video on Underline Add to Favorites

Findings: Large Language Models Findings Paper

Session 4: Large Language Models (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 11, Session 4 (15:00-16:30 UTC)

Keywords: pre-training

Languages: arabic

TLDR: Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS a...

You can open the #paper-P1834 channel in a separate window.

Abstract: Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.