Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic
Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, Claudia Borg
The Workshop on Computation and Written Language (CAWL) Paper
TLDR:
Maltese is a low-resource language of Arabic and Romance origins written in Latin script. We explore the impact of transliterating Maltese into Arabic script on a number of downstream tasks. We compare multiple transliteration pipelines ranging from simple one-to-one character maps to more sophistic
You can open the
#paper-CAWL_14
channel in a separate window.
Abstract:
Maltese is a low-resource language of Arabic and Romance origins written in Latin script. We explore the impact of transliterating Maltese into Arabic script on a number of downstream tasks. We compare multiple transliteration pipelines ranging from simple one-to-one character maps to more sophisticated alternatives that explore multiple possibilities or make use of manual linguistic annotations. We show that the sophisticated systems are consistently better than simpler systems, quantitatively and qualitatively. We also show transliterating Maltese can be considered as an option to improve the cross-lingual transfer capabilities.