Sartipi-Sedighin at SemEval-2023 Task 2: Fine-grained Named Entity Recognition with Pre-trained Contextual Language Models and Data Augmentation from Wikipedia

Amir Sartipi, Amirreza Sedighin, Afsaneh Fatemi, Hamidreza Baradaran Kashani

The 17th International Workshop on Semantic Evaluation (SemEval-2023) Task 2: multiconer ii multilingual complex named entity recognition Paper

TLDR: This paper presents the system developed by the Sartipi-Sedighin team for SemEval 2023 Task 2, which is a shared task focused on multilingual complex named entity recognition (NER), or MultiCoNER II. The goal of this task is to identify and classify complex named entities (NEs) in text across multip
You can open the #paper-SemEval_89 channel in a separate window.
Abstract: This paper presents the system developed by the Sartipi-Sedighin team for SemEval 2023 Task 2, which is a shared task focused on multilingual complex named entity recognition (NER), or MultiCoNER II. The goal of this task is to identify and classify complex named entities (NEs) in text across multiple languages. To tackle the MultiCoNER II task, we leveraged pre-trained language models (PLMs) fine-tuned for each language included in the dataset. In addition, we also applied a data augmentation technique to increase the amount of training data available to our models. Specifically, we searched for relevant NEs that already existed in the training data within Wikipedia, and we added new instances of these entities to our training corpus.Our team achieved an overall F1 score of 61.25\% in the English track and 71.79\% in the multilingual track across all 13 tracks of the shared task that we submitted to.