Online conversation is a ubiquitous way to share information and connect everyone but repetitive idiomatic text typing takes users a lot of time.
This paper demonstrates a simple yet effective cloud based smart compose system to improve human-to-human conversation efficiency.
Heuristics from different perspectives are designed to achieve the best trade-off between quality and latency.
From the modeling side, the decoder-only model exploited the previous turns of conversational history in a computation lightweight manner.
Besides, a novel phrase tokenizer is proposed to reduce latency without losing the composing quality further.
Additionally, the caching mechanism is applied to the serving framework.
The demo video of the system is available at https://youtu.be/U1KXkaqr60g.
We open-sourced our phrase tokenizer in https://github.com/tensorflow/text.