PMCoders at SemEval-2023 Task 1: RAltCLIP: Use Relative AltCLIP Features to Rank
Mohammad Javad Pirhadi, Motahhare Mirzaei, Mohammad Reza Mohammadi, Sauleh Eetemadi
The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task-1 - visual word sense disambiguation (visual-wsd) Paper
TLDR:
Visual Word Sense Disambiguation (VWSD) task aims to find the most related image among 10 images to an ambiguous word in some limited textual context. In this work, we use AltCLIP features and a 3-layer standard transformer encoder to compare the cosine similarity between the given phrase and differ
You can open the
#paper-SemEval_266
channel in a separate window.
Abstract:
Visual Word Sense Disambiguation (VWSD) task aims to find the most related image among 10 images to an ambiguous word in some limited textual context. In this work, we use AltCLIP features and a 3-layer standard transformer encoder to compare the cosine similarity between the given phrase and different images. Also, we improve our model's generalization by using a subset of LAION-5B. The best official baseline achieves 37.20\% and 54.39\% macro-averaged hit rate and MRR (Mean Reciprocal Rank) respectively. Our best configuration reaches 39.61\% and 56.78\% macro-averaged hit rate and MRR respectively. The code will be made publicly available on GitHub.