Deep Active Learning for Morphophonological Processing

Seyed Morteza Mirbostani; Yasaman Boreshban; Salam Khalifa; SeyedAbolghasem Mirroshandel; Owen Rambow

Deep Active Learning for Morphophonological Processing

Seyed Morteza Mirbostani, Yasaman Boreshban, Salam Khalifa, SeyedAbolghasem Mirroshandel, Owen Rambow

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Phonology, Morphology, and Word Segmentation Main-poster Paper

Poster Session 2: Phonology, Morphology, and Word Segmentation (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 10, 14:00-15:30 (EDT) (America/Toronto)

Global Time: July 10, Poster Session 2 (18:00-19:30 UTC)

Keywords: morphological inflection

Languages: arabic

TLDR: Building a system for morphological processing is a challenging task in morphologically complex languages like Arabic. Although there are some deep learning based models that achieve successful results, these models rely on a large amount of annotated data. Building such datasets, specially for some...

You can open the #paper-P3413 channel in a separate window.

Abstract: Building a system for morphological processing is a challenging task in morphologically complex languages like Arabic. Although there are some deep learning based models that achieve successful results, these models rely on a large amount of annotated data. Building such datasets, specially for some of the lower-resource Arabic dialects, is very difficult, time-consuming, and expensive. In addition, some parts of the annotated data do not contain useful information for training machine learning models. Active learning strategies allow the learner algorithm to select the most informative samples for annotation. There has been little research that focuses on applying active learning for morphological inflection and morphophonological processing. In this paper, we have proposed a deep active learning method for this task. Our experiments on Egyptian Arabic show that with only about 30\% of annotated data, we achieve the same results as does the state-of-the-art model on the whole dataset.