-
Knowledge Network and Social Media Based Reputation Management
The dataset used in this research is a collection of employee knowledge network and personal reputation on social media. -
Twitter datasets
The dataset used in this paper for controversy detection in social media. -
Finding function in form: Compositional character models for open vocabulary ...
A character-level encoder for social media posts trained using supervision from associated hashtags. -
Tweet2Vec: Character-Based Distributed Representations for Social Media
Text from social media provides a set of challenges that can cause traditional NLP approaches to fail. Informal language, spelling errors, abbreviations, and special characters... -
Online Media Monitor (OMM) dataset
The Online Media Monitor (OMM) from the University of Hamburg contributed with a dataset of 5,236,660 unlabeled tweets gathered from June 21, 2022, to December 8, 2022. -
Million User Dataset
The Million User Dataset (MUD) consists of all posts by authors who published at least 100 and at most 1000 posts between July 2015 and June 2016. -
Twitter Ideology-detection via Multi-task Multi-relational Embedding
The dataset is used to study political biases of entities and hashtags on Twitter. It contains tweets from politicians, news outlets, and other verified Twitter accounts. -
Characterizing Diabetes, Diet, Exercise, and Obesity on Twitter
The dataset contains 4.5 million tweets related to diabetes, diet, exercise, and obesity. -
Dataset II: Multilingual Forums
The dataset includes discussions from six popular subreddits (in English) and also discussions in French and German, demonstrating the utility of our approach to multilingual... -
English Tweets Dataset
The dataset for English Tweets, used as the source domain for Domain Adaptation. -
CCTT14 Dataset
The CCTT14 dataset is a collection of 994 labeled texts, where each text is annotated with one of 14 categories. -
CCTI14 Dataset
The CCTI14 dataset is a collection of 18,966 labeled images, where each image is annotated with one of 14 categories. -
WeiboScope Dataset
The WeiboScope dataset tracks about 120,000 users from three samples: high-viral potential users, censored users, and random users. The dataset includes 64,022 censored posts... -
Twitter and YouTube Interactions Dataset
The dataset contains 14,133 users with 12,148,994 tweets and 254,659 YouTube video interactions. -
CovidMis20
The CovidMis20 dataset contains around 1,375,592 tweets from February to July 2020, which can be used to develop automatic fake news detection models. -
SoMoSiMu-Bench: A Benchmark for Social Movement Simulation
A Twitter-like environment and a benchmark SoMoSiMu-Bench for simulation and evaluation of social media user simulation.