Evaluating Zero-Shot Event Structures: Recommendations for Automatic Content Extraction (ACE) Annotations

Erica Cai; Brendan O'Connor

Evaluating Zero-Shot Event Structures: Recommendations for Automatic Content Extraction (ACE) Annotations

Erica Cai, Brendan O'Connor

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Resources and Evaluation Main-poster Paper

Poster Session 6: Resources and Evaluation (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 12, 09:00-10:30 (EDT) (America/Toronto)

Global Time: July 12, Poster Session 6 (13:00-14:30 UTC)

Keywords: evaluation methodologies, evaluation

TLDR: Zero-shot event extraction (EE) methods infer richly structured event records from text, based only on a minimal user specification and no training examples, which enables flexibility in exploring and developing applications. Most event extraction research uses the Automatic Content Extraction (ACE)...

You can open the #paper-P4581 channel in a separate window.

Abstract: Zero-shot event extraction (EE) methods infer richly structured event records from text, based only on a minimal user specification and no training examples, which enables flexibility in exploring and developing applications. Most event extraction research uses the Automatic Content Extraction (ACE) annotated dataset to evaluate supervised EE methods, but can it be used to evaluate zero-shot and other low-supervision EE? We describe ACE's event structures and identify significant ambiguities and issues in current evaluation practice, including (1) coreferent argument mentions, (2) conflicting argument head conventions, and (3) ignorance of modality and event class details. By sometimes mishandling these subtleties, current work may dramatically understate the actual performance of zero-shot and other low-supervision EE, considering up to 32\% of correctly identified arguments and 25\% of correctly ignored event mentions as false negatives. For each issue, we propose recommendations for future evaluations so the research community can better utilize ACE as an event evaluation resource.