-
Clickbait Challenge 2017
The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection. -
FHM Dataset
The FHM dataset, a multimodal framework for detecting hateful memes on social media. -
Harmful Meme Detection Datasets
The dataset used in the paper for harmful meme detection, consisting of three meme datasets: Harm-C, Harm-P, and FHM. -
Media Frames Corpus
A dataset of annotated news articles and social media posts for frame classification. -
Tweet Judgement Classification of Rumours
The dataset used in the paper for tweet-level judgement classification of rumours in social media. -
Social Media and Network Camera Data for Disaster Response
This dataset contains social media posts and network camera data collected during Hurricane Irma in 2017. -
Multilingual Offensive Language Identification Dataset (OLID)
The dataset is a multilingual offensive language identification dataset for social media, containing posts from Arabic, Danish, English, Greek, and Turkish. -
Retweet Cascade
The dataset used in this paper is a retweet cascade about a Gaming Youtube video. -
Reddit Sarcoidosis Forum Dataset
The dataset analyzed in this study comprises threads and comments from the sarcoidosis forum on the social media platform Reddit. -
PIPA (People In Photo Albums)
A large-scale dataset of social media photos crawled from Flickr, used for person recognition task in social media setup -
Breitfeller et al. (2019)
The dataset contains microaggressions in the form of social media posts. -
Stanceosaurus: A New Corpus for Multicultural Misinformation Classification
Stanceosaurus is a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. -
Twitter Dataset
The Twitter Dataset is a collection of tweets annotated with Plutchik's emotions, consisting of tweets in three different languages: English, Dutch, and German. -
Toxic Comment Classification Challenge dataset
The Toxic Comment Classification Challenge dataset contains comments from Wikipedia organized in six classes: toxic, severe toxic, obscene, threat, insult, and identity hate. -
Hate Speech Tweets dataset
The Hate Speech Tweets dataset contains over 24,000 English tweets labeled as non-offensive, hate speech, and profanity. -
Offensive Language Identification Dataset (OLID)
The Offensive Language Identification Dataset (OLID) is a large collection of English tweets annotated for offensive language use, following a three-level hierarchical schema... -
ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research
A multimodal repository for COVID-19 news credibility research, providing textual, visual, temporal, and network information regarding news content and how news spreads on... -
Twitter Social Media Dataset
The dataset used in this paper is a collection of social media data from Twitter, including user profiles, follow links, and tweets. -
Cascades – BBC News Dataset
Cascades datasets constructed in a recursive manner, including Cascades – PAP News Dataset and Cascades – BBC News Dataset.