Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction

Qi Sun, Kun Huang, Xiaocui Yang, Pengfei Hong, Kun Zhang, Soujanya Poria

Main: Information Extraction Main-oral Paper

Session 7: Information Extraction (Oral)
Conference Room: Metropolitan Centre
Conference Time: July 12, 11:00-11:45 (EDT) (America/Toronto)
Global Time: July 12, Session 7 (15:00-15:45 UTC)
Keywords: named entity recognition and relation extraction, document-level extraction
TLDR: Document-level relation extraction (DocRE) aims to infer complex semantic relations among entities in a document. Distant supervision (DS) is able to generate massive auto-labeled data, which can improve DocRE performance. Recent works leverage pseudo labels generated by the pre-denoising model to r...
You can open the #paper-P430 channel in a separate window.
Abstract: Document-level relation extraction (DocRE) aims to infer complex semantic relations among entities in a document. Distant supervision (DS) is able to generate massive auto-labeled data, which can improve DocRE performance. Recent works leverage pseudo labels generated by the pre-denoising model to reduce noise in DS data. However, unreliable pseudo labels bring new noise, e.g., adding false pseudo labels and losing correct DS labels. Therefore, how to select effective pseudo labels to denoise DS data is still a challenge in document-level distant relation extraction. To tackle this issue, we introduce uncertainty estimation technology to determine whether pseudo labels can be trusted. In this work, we propose a Document-level distant Relation Extraction framework with Uncertainty Guided label denoising, UGDRE. Specifically, we propose a novel instance-level uncertainty estimation method, which measures the reliability of the pseudo labels with overlapping relations. By further considering the long-tail problem, we design dynamic uncertainty thresholds for different types of relations to filter high-uncertainty pseudo labels. We conduct experiments on two public datasets. Our framework outperforms strong baselines by 1.91 F1 and 2.28 Ign F1 on the RE-DocRED dataset.