Decomposed scoring of CCG dependencies

Aditya Bhargava; Gerald Penn

Decomposed scoring of CCG dependencies

Aditya Bhargava, Gerald Penn

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Syntax: Tagging, Chunking, and Parsing Main-poster Paper

Poster Session 3: Syntax: Tagging, Chunking, and Parsing (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 09:00-10:30 (EDT) (America/Toronto)

Global Time: July 11, Poster Session 3 (13:00-14:30 UTC)

Keywords: parsing algorighms (symbolic, theoritical results)

TLDR: In statistical parsing with CCG, the standard evaluation method is based on predicate-argument structure and evaluates dependencies labelled in part by lexical categories. When a predicate has multiple argument slots that can be filled, the same lexical category is used for the label of multiple dep...

You can open the #paper-P4198 channel in a separate window.

Abstract: In statistical parsing with CCG, the standard evaluation method is based on predicate-argument structure and evaluates dependencies labelled in part by lexical categories. When a predicate has multiple argument slots that can be filled, the same lexical category is used for the label of multiple dependencies. In this paper, we show that this evaluation can result in disproportionate penalization of supertagging errors and obfuscate the truly erroneous dependencies. Enabled by the compositional nature of CCG lexical categories, we propose *decomposed scoring* based on subcategorial labels to address this. To evaluate our scoring method, we engage fellow categorial grammar researchers in two English-language judgement tasks: (1) directly ranking the outputs of the standard and experimental scoring methods; and (2) determining which of two sentences has the better parse in cases where the two scoring methods disagree on their ranks. Overall, the judges prefer decomposed scoring in each task; but there is substantial disagreement among the judges in 24\% of the given cases, pointing to potential issues with parser evaluations in general.