Dataset - LDM

Clickbait Challenge 2017

The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection.
- Dataset
- JSON
FHM Dataset

The FHM dataset, a multimodal framework for detecting hateful memes on social media.
- Dataset
Harmful Meme Detection Datasets

The dataset used in the paper for harmful meme detection, consisting of three meme datasets: Harm-C, Harm-P, and FHM.
- Dataset
- JSON
Media Frames Corpus

A dataset of annotated news articles and social media posts for frame classification.
- Dataset
- JSON
Tweet Judgement Classification of Rumours

The dataset used in the paper for tweet-level judgement classification of rumours in social media.
- Dataset
- JSON
Social Media and Network Camera Data for Disaster Response

This dataset contains social media posts and network camera data collected during Hurricane Irma in 2017.
- Dataset
- JSON
Multilingual Offensive Language Identiﬁcation Dataset (OLID)

The dataset is a multilingual offensive language identification dataset for social media, containing posts from Arabic, Danish, English, Greek, and Turkish.
- Dataset
- JSON
Retweet Cascade

The dataset used in this paper is a retweet cascade about a Gaming Youtube video.
- Dataset
- JSON
Reddit Sarcoidosis Forum Dataset

The dataset analyzed in this study comprises threads and comments from the sarcoidosis forum on the social media platform Reddit.
- Dataset
- JSON
PIPA (People In Photo Albums)

A large-scale dataset of social media photos crawled from Flickr, used for person recognition task in social media setup
- Dataset
- JSON
Breitfeller et al. (2019)

The dataset contains microaggressions in the form of social media posts.
- Dataset
- JSON
Location

Location dataset contains 446 binary attributes, representing whether the user visited a certain region or location type.
- Dataset
- JSON
Stanceosaurus: A New Corpus for Multicultural Misinformation Classification

Stanceosaurus is a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims.
- Dataset
- JSON
Twitter Dataset

The Twitter Dataset is a collection of tweets annotated with Plutchik's emotions, consisting of tweets in three different languages: English, Dutch, and German.
- Dataset
- JSON
Toxic Comment Classification Challenge dataset

The Toxic Comment Classification Challenge dataset contains comments from Wikipedia organized in six classes: toxic, severe toxic, obscene, threat, insult, and identity hate.
- Dataset
- JSON
Hate Speech Tweets dataset

The Hate Speech Tweets dataset contains over 24,000 English tweets labeled as non-offensive, hate speech, and profanity.
- Dataset
- JSON
Offensive Language Identiﬁcation Dataset (OLID)

The Offensive Language Identiﬁcation Dataset (OLID) is a large collection of English tweets annotated for offensive language use, following a three-level hierarchical schema...
- Dataset
- JSON
ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

A multimodal repository for COVID-19 news credibility research, providing textual, visual, temporal, and network information regarding news content and how news spreads on...
- Dataset
- JSON
Twitter Social Media Dataset

The dataset used in this paper is a collection of social media data from Twitter, including user profiles, follow links, and tweets.
- Dataset
- JSON
Cascades – BBC News Dataset

Cascades datasets constructed in a recursive manner, including Cascades – PAP News Dataset and Cascades – BBC News Dataset.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

26 datasets found