-
FRENK Dataset
The FRENK Dataset is a collection of Slovene and English comments annotated for hate speech. -
Wikipedia Hate Speech Dataset
The Wikipedia Hate Speech Dataset is a collection of user comments annotated for hate speech. -
Reddit Hate Detection Dataset
The Reddit Hate Detection Dataset is a collection of Reddit comments annotated for hate speech. -
The Gab Hate Corpus
The Gab Hate Corpus is a collection of 27k posts annotated for hate speech. -
PEACE: Cross-Platform Hate Speech Detection
The PEACE dataset is a collection of social media posts and comments annotated for hate speech. -
Reputable News Index (RNIX)
The Reputable News Index (RNIX) dataset consists of retweet cascades linking articles from 28 reputable news publishers. -
Conroversial News Index (CNIX)
The Conroversial News Index (CNIX) dataset consists of retweet cascades mentioning articles from 41 online news publishers known for controversial content. -
Twitter Depression Detection Dataset
The dataset used in this study contains tweets from 111 user profiles and more than 300,000 tweets. -
Anonymous Twitter Dataset
The dataset used in this paper is a collection of tweets from Anonymous accounts and a random sample of non-Anonymous Twitter users. -
WeChat dataset
Fake news detection task can be defined as a binary classification problem, where each news article can be real (y = 0) or fake (y = 1). The WeChat dataset is collected from... -
Statistical Analysis of Perspective Scores on Hate Speech Detection
Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. -
Twigraph: Discovering and Visualizing Influential Words between Twitter Profiles
The dataset used in the paper is a collection of 1.1M tweets from Twitter, with approximately 3000 tweets per user from various domains such as politics, sports, entertainment,... -
Sina Weibo dataset
Sina Weibo dataset contains 226.8 million Weibo posts collected over the full course of 2012. -
TuDiabetes Forum
TuDiabetes Forum: We also collected a dataset from the TuDiabetes forum, a popular diabetes community operated by the Diabetes Hands Foundation. -
BGnow, TuDiabetes Forum
BGnow dataset is derived from diabetic users who actively share their wellness data on Twitter. TuDiabetes Forum: We also collected a dataset from the TuDiabetes forum, a... -
Diabetes Support Group, BGnow, TuDiabetes Forum
Diabetes Support Group dataset is collected from posts of users who follow and participate in diabetes support groups like “diabeteslife” or “diabetesconnect” on Twitter. BGnow... -
Twitter Name Tagging (TNT) and Broad Twitter Corpus (BTC)
Twitter Name Tagging (TNT) and Broad Twitter Corpus (BTC) datasets are used for named entity recognition in social media. -
Flood Detection from Social Media
The dataset is used for training a model to detect floods from social media reports.