A Weakly Supervised Classifier and Dataset of White Supremacist Language

Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M Carley

Main: Computational Social Science and Cultural Analytics Main-poster Paper

Poster Session 2: Computational Social Science and Cultural Analytics (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 10, 14:00-15:30 (EDT) (America/Toronto)
Global Time: July 10, Poster Session 2 (18:00-19:30 UTC)
Keywords: hate-speech detection
TLDR: We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar ...
You can open the #paper-P2226 channel in a separate window.
Abstract: We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.