Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning
Ran Zhou, Xin Li, Lidong Bing, Erik Cambria, Chunyan Miao
Main: Multilingualism and Cross-Lingual NLP Main-poster Paper
Poster Session 2: Multilingualism and Cross-Lingual NLP (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 10, 14:00-15:30 (EDT) (America/Toronto)
Global Time: July 10, Poster Session 2 (18:00-19:30 UTC)
Keywords:
cross-lingual transfer
TLDR:
In cross-lingual named entity recognition (NER), self-training is commonly used to bridge the linguistic gap by training on pseudo-labeled target-language data. However, due to sub-optimal performance on target languages, the pseudo labels are often noisy and limit the overall performance.
In this ...
You can open the
#paper-P1542
channel in a separate window.
Abstract:
In cross-lingual named entity recognition (NER), self-training is commonly used to bridge the linguistic gap by training on pseudo-labeled target-language data. However, due to sub-optimal performance on target languages, the pseudo labels are often noisy and limit the overall performance.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement in one coherent framework.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
Our contrastive self-training facilitates span classification by separating clusters of different classes, and enhances cross-lingual transferability by producing closely-aligned representations between the source and target language.
Meanwhile, prototype-based pseudo-labeling effectively improves the accuracy of pseudo labels during training.
We evaluate ContProto on multiple transfer pairs, and experimental results show our method brings substantial improvements over current state-of-the-art methods.