-
Blogger Dataset
The dataset used in this study is a large, industry-annotated dataset that contains over 20,000 blog users. -
Twitter OOV Word Dataset
The dataset is a collection of Twitter tweets, filtered to include only English language tweets. The dataset is used to study out-of-vocabulary (OOV) words in Twitter. -
Reputable News Index (RNIX)
The Reputable News Index (RNIX) dataset consists of retweet cascades linking articles from 28 reputable news publishers. -
Conroversial News Index (CNIX)
The Conroversial News Index (CNIX) dataset consists of retweet cascades mentioning articles from 41 online news publishers known for controversial content. -
Anonymous Twitter Dataset
The dataset used in this paper is a collection of tweets from Anonymous accounts and a random sample of non-Anonymous Twitter users. -
Media Frames Corpus
A dataset of annotated news articles and social media posts for frame classification. -
Tweet Judgement Classification of Rumours
The dataset used in the paper for tweet-level judgement classification of rumours in social media. -
Weibo Corpus
A dataset containing unstructured dialogues extracted from Weibo. -
Twitter Corpus
A dataset containing unstructured dialogues extracted from Twitter. -
Twigraph: Discovering and Visualizing Influential Words between Twitter Profiles
The dataset used in the paper is a collection of 1.1M tweets from Twitter, with approximately 3000 tweets per user from various domains such as politics, sports, entertainment,... -
Sina Weibo dataset
Sina Weibo dataset contains 226.8 million Weibo posts collected over the full course of 2012. -
TuDiabetes Forum
TuDiabetes Forum: We also collected a dataset from the TuDiabetes forum, a popular diabetes community operated by the Diabetes Hands Foundation. -
BGnow, TuDiabetes Forum
BGnow dataset is derived from diabetic users who actively share their wellness data on Twitter. TuDiabetes Forum: We also collected a dataset from the TuDiabetes forum, a... -
Diabetes Support Group, BGnow, TuDiabetes Forum
Diabetes Support Group dataset is collected from posts of users who follow and participate in diabetes support groups like “diabeteslife” or “diabetesconnect” on Twitter. BGnow... -
Social Media and Network Camera Data for Disaster Response
This dataset contains social media posts and network camera data collected during Hurricane Irma in 2017. -
Affective Polarization Dataset
The dataset used in the paper is a collection of tweets related to affective polarization, with two scales: aversion against Republicans and aversion against Democrats. -
Multilingual Offensive Language Identification Dataset (OLID)
The dataset is a multilingual offensive language identification dataset for social media, containing posts from Arabic, Danish, English, Greek, and Turkish. -
Retweet Cascade
The dataset used in this paper is a retweet cascade about a Gaming Youtube video. -
Flood Detection from Social Media
The dataset is used for training a model to detect floods from social media reports. -
Reddit Sarcoidosis Forum Dataset
The dataset analyzed in this study comprises threads and comments from the sarcoidosis forum on the social media platform Reddit.