Transformer Language Models Handle Word Frequency in Prediction Head
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
Findings: Interpretability and Analysis of Models for NLP Findings Paper
    Session 1: Interpretability and Analysis of Models for NLP (Virtual Poster)
    
Conference Room: Pier 7&8 
    Conference Time: July 10, 11:00-12:30 (EDT) (America/Toronto)
    Global Time: July 10, Session 1 (15:00-16:30 UTC)
    
    
  
    Spotlight Session: Spotlight - Metropolitan West (Spotlight)
    
Conference Room: Metropolitan West 
    Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)
    Global Time: July 10, Spotlight Session (23:00-01:00 UTC)
    
    
  
          Keywords:
          probing
        
        
        
        
          TLDR:
          Prediction head is a crucial component of Transformer language models.
Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers.
In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters.
Our e...
        
  
    You can open the
    #paper-P5289
    channel in a separate window.
  
  
    
            Abstract:
            Prediction head is a crucial component of Transformer language models.
Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers.
In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters.
Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a significant role in the models' ability to reflect word frequency in a corpus, aligning with the logit adjustment method commonly used in long-tailed learning. 
We also quantify the effect of controlling the biases in practical auto-regressive text generation scenarios;
under a particular setting, more diverse text can be generated without compromising text quality.
          
         Anthology
 Anthology
       Underline
 Underline