Dataset - LDM

Toxic Comment Classification Challenge dataset

The Toxic Comment Classification Challenge dataset contains comments from Wikipedia organized in six classes: toxic, severe toxic, obscene, threat, insult, and identity hate.
- Dataset
- JSON
Hate Speech Tweets dataset

The Hate Speech Tweets dataset contains over 24,000 English tweets labeled as non-offensive, hate speech, and profanity.
- Dataset
- JSON
Classification of Research Citations (CRC)

A dataset of 150 research papers from the domain of computer science, manually annotated and class labelled for sentiment analysis.
- Dataset
- JSON
Stream TwitterSentiment

Stream TwitterSentiment is a dataset of tweets, focusing on sentiment analysis, and is used to test the performance of active stream learning algorithms for polarity learning.
- Dataset
- JSON
StreamJi

StreamJi is a dataset of product reviews, focusing on features of products, and is used to test the performance of active stream learning algorithms for polarity learning.
- Dataset
- JSON
Hatespeech

The Hatespeech dataset is a collection of tweets containing lexicons used in hate speech.
- Dataset
- JSON
Yelp 2014

IMDB and Yelp datasets for sentiment classiﬁcation
- Dataset
- JSON
Yelp 2013

IMDB and Yelp datasets for sentiment classiﬁcation
- Dataset
- JSON
IMDB and Yelp datasets

IMDB and Yelp are datasets used for sentiment analysis and author identification.
- Dataset
- JSON
Entity-Specific Sentiment Classification of Yahoo News Comments

The dataset is used for entity-specific sentiment classification of Yahoo News comments.
- Dataset
- JSON
Sentiment Training Dataset

Sentiment training dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.
- Dataset
- JSON
Tweet Sentiment Extraction

The Tweet Sentiment Extraction dataset contains positive, negative, and neutral tweets with human-annotated rationales.
- Dataset
- JSON
Movie Reviews

The Movie Reviews dataset contains positive and negative movie reviews with rationales annotated by humans to support classification.
- Dataset
- JSON
DynaSent

The DynaSent dataset contains approximately 122,000 sentences, each labeled as positive, neutral, or negative.
- Dataset
- JSON
MoodyLyricsPN

MoodyLyricsPN is a bigger collection of 5000 songs labeled as positive or negative only.
- Dataset
- JSON
MoodyLyrics4Q

MoodyLyrics4Q is a dataset of 2,000 songs, fully compliant with the four requisites listed in the previous section.
- Dataset
- JSON
VideoIC

VideoIC dataset for automatic live video commenting
- Dataset
- JSON
Livebot

Livebot dataset for automatic live video commenting
- Dataset
- JSON
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live...

Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) for Live Video Commenting
- Dataset
- JSON
New York Times and 20Newsgroups datasets

The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

83 datasets found