ECNU_MIV at SemEval-2023 Task 1: CTIM - Contrastive Text-Image Model for Multilingual Visual Word Sense Disambiguation

Zhenghui Li, Qi Zhang, Xueyin Xia, Yinxiang Ye, Qi Zhang, Cong Huang

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task-1 - visual word sense disambiguation (visual-wsd) Paper

TLDR: Our team focuses on the multimodal domain of images and texts, we propose a model that can learn the matching relationship between text-image pairs by contrastive learning. More specifically, We train the model from the labeled data provided by the official organizer, after pre-training, texts are u
You can open the #paper-SemEval_16 channel in a separate window.
Abstract: Our team focuses on the multimodal domain of images and texts, we propose a model that can learn the matching relationship between text-image pairs by contrastive learning. More specifically, We train the model from the labeled data provided by the official organizer, after pre-training, texts are used to reference learned visual concepts enabling visual word sense disambiguation tasks. In addition, the top results our teams get have been released showing the effectiveness of our solution.