-
Online Media Monitor (OMM) dataset
The Online Media Monitor (OMM) from the University of Hamburg contributed with a dataset of 5,236,660 unlabeled tweets gathered from June 21, 2022, to December 8, 2022. -
Million User Dataset
The Million User Dataset (MUD) consists of all posts by authors who published at least 100 and at most 1000 posts between July 2015 and June 2016. -
Twitter Ideology-detection via Multi-task Multi-relational Embedding
The dataset is used to study political biases of entities and hashtags on Twitter. It contains tweets from politicians, news outlets, and other verified Twitter accounts. -
Characterizing Diabetes, Diet, Exercise, and Obesity on Twitter
The dataset contains 4.5 million tweets related to diabetes, diet, exercise, and obesity. -
Dataset II: Multilingual Forums
The dataset includes discussions from six popular subreddits (in English) and also discussions in French and German, demonstrating the utility of our approach to multilingual... -
English Tweets Dataset
The dataset for English Tweets, used as the source domain for Domain Adaptation. -
CCTT14 Dataset
The CCTT14 dataset is a collection of 994 labeled texts, where each text is annotated with one of 14 categories. -
CCTI14 Dataset
The CCTI14 dataset is a collection of 18,966 labeled images, where each image is annotated with one of 14 categories. -
WeiboScope Dataset
The WeiboScope dataset tracks about 120,000 users from three samples: high-viral potential users, censored users, and random users. The dataset includes 64,022 censored posts... -
Twitter and YouTube Interactions Dataset
The dataset contains 14,133 users with 12,148,994 tweets and 254,659 YouTube video interactions. -
CovidMis20
The CovidMis20 dataset contains around 1,375,592 tweets from February to July 2020, which can be used to develop automatic fake news detection models. -
SoMoSiMu-Bench: A Benchmark for Social Movement Simulation
A Twitter-like environment and a benchmark SoMoSiMu-Bench for simulation and evaluation of social media user simulation. -
Metropolitan data
The Metropolitan data comes from a study of emotions expressed through Twitter messages posted from locations around Los Angeles County. -
Big data and big values: When companies need to rethink themselves
The dataset contains more than 94,000 tweets related to the core values of the firms listed in Fortune’s ranking of the World’s Most Admired Companies (2013-2017). -
Personality Traits and Echo Chambers on Facebook
The dataset contains 30K users who made more than 3M comments in a time span of 5 years (Jan 2010 — Dec 2014) on 413 US public Facebook pages supporting conflicting narratives —... -
GeoUK 2022 Tweets Dataset
A dataset of geolocated tweets in 2022, filtered to keep only tweets in the UK.