HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Muhammad, Aneesh Bose, Guneet Kohli, Ibrahim Ahmad, Ketan Kotwal, Sayan Deb Sarkar, Ondřej Bojar, Habeebah Kakudi

The 17th Linguistic Annotation Workshop (LAW-XVII) \\ @ ACL 2023 Paper

TLDR: This paper presents HaVQA, the first multi-modal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, th
You can open the #paper-LAW_F3 channel in a separate window.
Abstract: This paper presents HaVQA, the first multi-modal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multi-modal machine translation.