Evaluating Zero-Shot Event Structures: Recommendations for Automatic Content Extraction (ACE) Annotations

Erica Cai, Brendan O'Connor

Main: Resources and Evaluation Main-poster Paper

Poster Session 6: Resources and Evaluation (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 12, 09:00-10:30 (EDT) (America/Toronto)
Global Time: July 12, Poster Session 6 (13:00-14:30 UTC)
Keywords: evaluation methodologies, evaluation
TLDR: Zero-shot event extraction (EE) methods infer richly structured event records from text, based only on a minimal user specification and no training examples, which enables flexibility in exploring and developing applications. Most event extraction research uses the Automatic Content Extraction (ACE)...
You can open the #paper-P4581 channel in a separate window.
Abstract: Zero-shot event extraction (EE) methods infer richly structured event records from text, based only on a minimal user specification and no training examples, which enables flexibility in exploring and developing applications. Most event extraction research uses the Automatic Content Extraction (ACE) annotated dataset to evaluate supervised EE methods, but can it be used to evaluate zero-shot and other low-supervision EE? We describe ACE's event structures and identify significant ambiguities and issues in current evaluation practice, including (1) coreferent argument mentions, (2) conflicting argument head conventions, and (3) ignorance of modality and event class details. By sometimes mishandling these subtleties, current work may dramatically understate the actual performance of zero-shot and other low-supervision EE, considering up to 32\% of correctly identified arguments and 25\% of correctly ignored event mentions as false negatives. For each issue, we propose recommendations for future evaluations so the research community can better utilize ACE as an event evaluation resource.