SLT at SemEval-2023 Task 1: Enhancing Visual Word Sense Disambiguation through Image Text Retrieval using BLIP
Mohammadreza Molavi, Hossein Zeinali
The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task-1 - visual word sense disambiguation (visual-wsd) Paper
TLDR:
Based on recent progress in image-text retrieval techniques, this paper presents a fine-tuned model for the Visual Word Sense Disambiguation (VWSD) task. The proposed system fine-tunes a pre-trained model using ITC and ITM losses and employs a candidate selection approach for faster inference. The s
You can open the
#paper-SemEval_290
channel in a separate window.
Abstract:
Based on recent progress in image-text retrieval techniques, this paper presents a fine-tuned model for the Visual Word Sense Disambiguation (VWSD) task. The proposed system fine-tunes a pre-trained model using ITC and ITM losses and employs a candidate selection approach for faster inference. The system was trained on the VWSD task dataset and evaluated on a separate test set using Mean Reciprocal Rank (MRR) metric. Additionally, the system was tested on the provided test set which contained Persian and Italian languages, and the results were evaluated on each language separately. Our proposed system demonstrates the potential of fine-tuning pre-trained models for complex language tasks and provides insights for further research in the field of image text retrieval.