Text Classification - Groups

X-FORMAL

X-FORMAL dataset contains pairs of formal and informal texts in four languages: Brazilian Portuguese, French, Italian, and English.
- Dataset
- JSON
GYAFC

The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
- Dataset
- JSON
MARC

The MARC dataset is a multilingual text classification dataset that contains 6 languages.
- Dataset
- JSON
Yahoo Answer and Yelp15 review

Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.
- Dataset
- JSON
Cnews dataset

The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
- Dataset
- JSON
M10

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
20 NewsGroups

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
MR, Subj, SST-1, SST-2, MPQA

The dataset used in this paper for text classification task.
- Dataset
- JSON
20NEWS Dataset

The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
- Dataset
- JSON
TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
- Dataset
- JSON
Yelp Dataset

The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...
- Dataset
- JSON
IMDB Sentiment Classification

The IMDB sentiment classification dataset is used for text classification tasks.
- Dataset
- JSON
CNN/DailyMail

A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said.
- Dataset
- JSON
Ren-CECps

Multi-label text classification dataset Ren-CECps
- Dataset
- JSON
RCV1-v2

Multi-label text classification dataset RCV1-v2, Reuters Corpus Volume I
- Dataset
- JSON
20-Newsgroups dataset

The 20-Newsgroups dataset is a collection of text documents.
- Dataset
- JSON
Twitter and Pinterest dataset

The dataset used for the experiments on Twitter and Pinterest.
- Dataset
- JSON
REDDIT-BINARY dataset

The REDDIT-BINARY dataset contains 2,000 graphs labeled as question/answer-based or discussion-based community in the content-aggregation website Reddit.
- Dataset
- JSON
Full

The dataset used for sentiment analysis and topic classification tasks.
- Dataset
- JSON
Polarity

The dataset used for sentiment analysis and topic classification tasks.
- Dataset
- JSON

182 datasets found