[Demo] Fast Whitespace Correction with Encoder-Only Transformers

Hannah Bast, Matthias Hertel, Sebastian Walter

Demo: Generation (demo) Demo Paper

Demo Session 1: Generation (demo) (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 10, Demo Session 1 (15:00-16:30 UTC)
TLDR: The goal of whitespace correction is to fix space errors in arbitrary given text. For example, given the text "whi te space correctio nwithTransf or mers", produce "whitespace correction with Transformers". We compare two Transformer-based models, a character-level encoder-decoder model and a byte-l...
You can open the #paper-D95 channel in a separate window.
Abstract: The goal of whitespace correction is to fix space errors in arbitrary given text. For example, given the text "whi te space correctio nwithTransf or mers", produce "whitespace correction with Transformers". We compare two Transformer-based models, a character-level encoder-decoder model and a byte-level encoder-only model. We find that the encoder-only model is both faster and achieves higher quality. We provide an easy-to-use tool that is over 900 times faster than the previous best tool, with the same high quality. Our tool repairs text at a rate of over 200 kB/s on GPU, with a sequence-averaged F1-score ranging from 87.5% for hard-to-correct text up to 99% for text without any spaces.