Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Naoki Yoshinaga

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Naoki Yoshinaga

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Phonology, Morphology, and Word Segmentation Main-poster Paper

Poster Session 6: Phonology, Morphology, and Word Segmentation (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 12, 09:00-10:30 (EDT) (America/Toronto)

Global Time: July 12, Poster Session 6 (13:00-14:30 UTC)

Keywords: morphological analysis

Languages: japanese

TLDR: Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yieldi...

You can open the #paper-P2525 channel in a separate window.

Abstract: Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/