Nonparametric Masked Language Modeling
Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer
Findings: Large Language Models Findings Paper
    Session 7: Large Language Models (Virtual Poster)
    
Conference Room: Pier 7&8 
    Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)
    Global Time: July 12, Session 7 (15:00-16:30 UTC)
    
    
  
    Spotlight Session: Spotlight - Metropolitan Centre (Spotlight)
    
Conference Room: Metropolitan Centre 
    Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)
    Global Time: July 10, Spotlight Session (23:00-01:00 UTC)
    
    
  
          Keywords:
          pre-training, prompting, retrieval-augmented models
        
        
        
        
          TLDR:
          Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a re...
        
  
    You can open the
    #paper-P2706
    channel in a separate window.
  
  
    
            Abstract:
            Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.