Czech-ing the News: Article Trustworthiness Dataset for Czech

Matyas Bohacek, Michal Bravansky, Filip Trhlík, Vaclav Moravec

The 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Long Paper

TLDR: We present the Verifee dataset: a multimodal dataset of news articles with fine-grained trustworthiness annotations. We bring a diverse set of researchers from social, media, and computer sciences aboard to study this interdisciplinary problem holistically and develop a detailed methodology that ass
You can open the #paper-WASSA_18 channel in a separate window.
Abstract: We present the Verifee dataset: a multimodal dataset of news articles with fine-grained trustworthiness annotations. We bring a diverse set of researchers from social, media, and computer sciences aboard to study this interdisciplinary problem holistically and develop a detailed methodology that assesses the texts through the lens of editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We collect over $10,000$ annotated articles from $60$ Czech online news sources. Each item is categorized into one of the $4$ proposed classes on the credibility spectrum -- ranging from entirely trustworthy articles to deceptive ones -- and annotated of its manipulative attributes. We fine-tune prominent sequence-to-sequence language models for the trustworthiness classification task on our dataset and report the best F-1 score of $0.53$. We open-source the dataset, annotation methodology, and annotators' instructions in full length at https://www.verifee.ai/research/ to enable easy build-up work.