[Industry] pNLP-Mixer: an Efficient all-MLP Architecture for Language

Francesco Fusco, Damian Pascual, Peter Staar, Diego Antognini

Industry: Industry Industry Paper

Session 3: Industry (Oral)
Conference Room: Pier 4&5
Conference Time: July 11, 09:00-10:30 (EDT) (America/Toronto)
Global Time: July 11, Session 3 (13:00-14:30 UTC)
TLDR: Large pre-trained language models based on transformer architectureƒhave drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and infere...
You can open the #paper-I16 channel in a separate window.
Abstract: Large pre-trained language models based on transformer architectureƒhave drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can attain competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized model achieves 99.4\% and 97.8\% the performance of mBERT on MTOP and multiATIS, while using 170x less parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN), which is twice as large, by a margin up to 7.8\% on MTOP.