Dataset - LDM

Harassment (HAR) dataset

The dataset used for hate speech detection on Twitter
- Dataset
- JSON
HATE dataset

The dataset used for hate speech detection on Twitter
- Dataset
- JSON
Sexist/Racist (SR) dataset

The dataset used for hate speech detection on Twitter
- Dataset
- JSON
Hate Speech Detection on Twitter

The dataset used for hate speech detection on Twitter
- Dataset
- JSON
FRENK Dataset

The FRENK Dataset is a collection of Slovene and English comments annotated for hate speech.
- Dataset
- JSON
Wikipedia Hate Speech Dataset

The Wikipedia Hate Speech Dataset is a collection of user comments annotated for hate speech.
- Dataset
- JSON
Reddit Hate Detection Dataset

The Reddit Hate Detection Dataset is a collection of Reddit comments annotated for hate speech.
- Dataset
- JSON
The Gab Hate Corpus

The Gab Hate Corpus is a collection of 27k posts annotated for hate speech.
- Dataset
- JSON
PEACE: Cross-Platform Hate Speech Detection

The PEACE dataset is a collection of social media posts and comments annotated for hate speech.
- Dataset
- JSON
COVID-HATE

The dataset contains tweets expressing anti-Asian hate and countering hate speeches to support Asian ethnicity amidst COVID-19.
- Dataset
- JSON
PAN Profiling Hate Speech Spreader Task

The PAN Profiling Hate Speech Spreader Task contains a dataset in English and Spanish, whose samples were collected from Twitter.
- Dataset
- JSON
CREHate

CREHate is a cross-cultural English hate speech dataset comprising 1,580 posts from five English-speaking countries—AU, GB, SG, US, and ZA.
- Dataset
- JSON
HateXplain

The HateXplain dataset, containing 20,000 posts from Gab and Twitter, annotated with hate/offensive/normal labels.
- Dataset
- JSON
Bengali Hate Speech Dataset

The Bengali Hate Speech Dataset is a large-scale dataset for hate speech detection in the Bengali language. It contains 8,087 labelled examples, categorized into political,...
- Dataset
- JSON
Multimodal Hate Speech Detection in Bengali

Multimodal hate speech detection dataset for Bengali language
- Dataset
- JSON
DOLaH

A dataset containing 2,026 Facebook posts collected from Twitter, labeled as offensive or non-offensive.
- Dataset
- JSON
Hate Speech Detection Dataset

The dataset used in the paper is a collection of tweets with hate speech and offensive language, annotated with their sentiment.
- Dataset
- JSON
Twitter Hate Speech Dataset

A large-scale dataset of tweets, retweets, user activity history, and follower networks, comprising over 161 million tweets from more than 41 million unique users.
- Dataset
- JSON
HuggingFace DLab dataset

The HuggingFace DLab dataset is used for assessing fair target-group detection. It contains 135,556 posts with explicit annotations for the target group(s).
- Dataset
- JSON
Human-machine collaboration approaches to build a dialogue dataset for hate s...

Human-machine collaboration approaches to build a dialogue dataset for hate speech countering
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

20 datasets found