Predicting Numerals in Text Using Nearest Neighbor Language Models

Taku Sakamoto; Akiko Aizawa

Predicting Numerals in Text Using Nearest Neighbor Language Models

Taku Sakamoto, Akiko Aizawa

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Findings: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas Findings Paper

Session 1: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 10, Session 1 (15:00-16:30 UTC)

Keywords: reasoning

TLDR: Commonsense about quantitative properties is essential for a deep understanding of texts containing numerals. However, naive language models (LMs) treat numerals as string tokens; therefore, they lack an understanding of the magnitudes of numerals, resulting in a difficulty in acquiring the commonse...

You can open the #paper-P5255 channel in a separate window.

Abstract: Commonsense about quantitative properties is essential for a deep understanding of texts containing numerals. However, naive language models (LMs) treat numerals as string tokens; therefore, they lack an understanding of the magnitudes of numerals, resulting in a difficulty in acquiring the commonsense. In this study, we apply the $k$-nearest neighbor LM ($k$NN-LM) to the masked numeral prediction (MNP) task, which measures the quantitative commonsense of LMs. $k$NN-LM extends pre-trained neural LMs with the $k$-nearest neighbor ($k$NN) search. Since it can utilize patterns that appear in the datastore for prediction, we expect an improvement in numeral prediction accuracy, which is associated with a high rate of occurrence of out-of-vocabulary (OOV) words. Through experiments, we verified that the retrieval-based method is effective for fine-grained predictions of numerals from context, especially for the OOV numerals. We also compared two different context spans for context representations to improve the accuracy of $k$NN search by using only the words that are closely related to the masked numeral: the mask and its surrounding words, and the mask and its subsequent words. Our results reveal that using only the embeddings of mask tokens for numerals in $k$NN search is the most effective approach for realizing MNP tasks.