Going Beyond Sentence Embeddings: A Token-Level Matching Algorithm for Calculating Semantic Textual Similarity

Hongwei Wang; Dong Yu

Going Beyond Sentence Embeddings: A Token-Level Matching Algorithm for Calculating Semantic Textual Similarity

Hongwei Wang, Dong Yu

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Main: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas Main-poster Paper

Poster Session 1: Semantics: Sentence-level Semantics, Textual Inference, and Other Areas (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 10, Poster Session 1 (15:00-16:30 UTC)

Keywords: semantic textual similarity

TLDR: Semantic Textual Similarity (STS) measures the degree to which the underlying semantics of paired sentences are equivalent. State-of-the-art methods for STS task use language models to encode sentences into embeddings. However, these embeddings are limited in representing semantics because they mix ...

You can open the #paper-P1101 channel in a separate window.

Abstract: Semantic Textual Similarity (STS) measures the degree to which the underlying semantics of paired sentences are equivalent. State-of-the-art methods for STS task use language models to encode sentences into embeddings. However, these embeddings are limited in representing semantics because they mix all the semantic information together in fixed-length vectors, which are difficult to recover and lack explainability. This paper presents a token-level matching inference algorithm, which can be applied on top of any language model to improve its performance on STS task. Our method calculates pairwise token-level similarity and token matching scores, and then aggregates them with pretrained token weights to produce sentence similarity. Experimental results on seven STS datasets show that our method improves the performance of almost all language models, with up to 12.7\% gain in Spearman's correlation. We also demonstrate that our method is highly explainable and computationally efficient.