Text Classification - Groups

rcv1

The rcv1 dataset is a multiclass text classification dataset.
- Dataset
- JSON
WebKB

The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
- Dataset
- JSON
Reuters-8

The Reuters-8 dataset is a collection of news articles from Reuters.
- Dataset
- JSON
20Newsgrp

The 20Newsgrp dataset is a collection of news articles from 20 different newsgroups.
- Dataset
- JSON
iPosts dataset

The independently posted tweets dataset (henceforth: iPosts) that we used for contradiction detection between independently emerging claim-initiating tweets.
- Dataset
- JSON
Threads RTE dataset

The dataset on which the authors run disagreement reply detection (henceforth: Threads) was converted by us to RTE format based on the threaded conversations labeled in this...
- Dataset
- JSON
Wikipedia Neutrality Corpus

This dataset is used to test the ability of large language models to detect and correct biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy.
- Dataset
- JSON
Yelp reviews polarity dataset

Yelp reviews polarity dataset
- Dataset
- JSON
News

The News dataset consists of 5000 randomly sampled news articles from the NY Times corpus. It simulates the opinions of media consumers on news items. The units are different...
- Dataset
- JSON
X-FORMAL

X-FORMAL dataset contains pairs of formal and informal texts in four languages: Brazilian Portuguese, French, Italian, and English.
- Dataset
- JSON
GYAFC

The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
- Dataset
- JSON
MARC

The MARC dataset is a multilingual text classification dataset that contains 6 languages.
- Dataset
- JSON
Yahoo Answer and Yelp15 review

Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.
- Dataset
- JSON
Cnews dataset

The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
- Dataset
- JSON
M10

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
20 NewsGroups

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
MR, Subj, SST-1, SST-2, MPQA

The dataset used in this paper for text classification task.
- Dataset
- JSON
20NEWS Dataset

The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
- Dataset
- JSON
TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
- Dataset
- JSON
Yelp Dataset

The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...
- Dataset
- JSON

211 datasets found