[Industry] CUPID: Curriculum Learning Based Real-Time Prediction using Distillation

Arindam Bhattacharya; Ankith Ms; Ankit Gandhi; Vijay Huddar; Atul Saroop; Rahul Bhagat

[Industry] CUPID: Curriculum Learning Based Real-Time Prediction using Distillation

Arindam Bhattacharya, Ankith Ms, Ankit Gandhi, Vijay Huddar, Atul Saroop, Rahul Bhagat

📝 Paper

Anthology

Underline 📺 Watch Video on Underline Add to Favorites

Industry: Industry Industry Paper

Session 5: Industry (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)

Global Time: July 11, Session 5 (20:15-21:45 UTC)

TLDR: Relevance in E-commerce Product Search is crucial for providing customers with accurate results that match their query intent. With recent advancements in NLP and Deep Learning, Transformers have become the default choice for relevance classification tasks. In such a setting, the relevance model use...

You can open the #paper-I205 channel in a separate window.

Abstract: Relevance in E-commerce Product Search is crucial for providing customers with accurate results that match their query intent. With recent advancements in NLP and Deep Learning, Transformers have become the default choice for relevance classification tasks. In such a setting, the relevance model uses query text and product title as input features, and estimates if the product is relevant for the customer query. While cross-attention in Transformers enables a more accurate relevance prediction in such a setting, its high evaluation latency makes it unsuitable for real-time predictions in which thousands of products must be evaluated against a user query within few milliseconds. To address this issue, we propose CUPID: a Curriculum learning based real-time Prediction using Distillation that utilizes knowledge distillation within a curriculum learning setting to learn a simpler architecture that can be evaluated within low latency budgets. In a bi-lingual relevance prediction task, our approach shows an 302 bps improvement on English and 676 bps improvement for low-resource Arabic, while maintaining the low evaluation latency on CPUs.