You Are What You Read: Inferring Personality From Consumed Textual Content

Adam Sutton, Almog Simchon, Matthew Edwards, Stephan Lewandowsky

The 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Long Paper

TLDR: In this work we use consumed text to infer Big-5 personality inventories using data we have collected from the social media platform Reddit. We test our model on two datasets, sampled from participants who consumed either fiction content ($N = 913$) or news content ($N = 213$). We show that state-of
You can open the #paper-WASSA_9 channel in a separate window.
Abstract: In this work we use consumed text to infer Big-5 personality inventories using data we have collected from the social media platform Reddit. We test our model on two datasets, sampled from participants who consumed either fiction content ($N = 913$) or news content ($N = 213$). We show that state-of-the-art models from a similar task using authored text do not translate well to this task, with average correlations of $r=.06$ between the model's predictions and ground-truth personality inventory dimensions. We propose an alternate method of generating average personality labels for each piece of text consumed, under which our model achieves correlations as high as $r=.34$ when predicting personality from the text being read.