Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation

Zeinab Taghavi, Parsa Haghighi Naeini, Mohammad Ali Sadraei Javaheri, Soroush Gooran, Ehsaneddin Asgari, Hamid Reza Rabiee, Hossein Sameti

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task-1 - visual word sense disambiguation (visual-wsd) Paper

TLDR: This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and tex
You can open the #paper-SemEval_295 channel in a separate window.
Abstract: This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on ``Visual Word Sense Disambiguation (Visual-WSD)'' using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including ``Simple prompt-based'' methods and ``Generated prompt-based'' methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\textbackslash{}\% in our best attempt.