Text Classification - Groups

Clickbait Challenge 2017

The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection.
- Dataset
- JSON
Fake News Challenge Stage 1 (FNC-1)

The FNC-1 dataset is a supervised classification task for stance detection, where the goal is to automatically predict the labels in a supervised classification task.
- Dataset
- JSON
CAL500

Text categorization, a document may be associated with a range of topics, such as science, entertainment, and news.
- Dataset
- JSON
Semeval-2016 Task 6: Detecting stance in tweets

Semeval-2016 Task 6: Detecting stance in tweets.
- Dataset
- JSON
Rotten Tomatoes

The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences.
- Dataset
- JSON
HONEST Race

The dataset used for toxicity and stereotype mitigation task, which consists of 25 thousand examples of positive and negative movie reviews.
- Dataset
- JSON
IMDb Review Dataset

The IMDb review dataset is used for positive generation task.
- Dataset
- JSON
AmazonCat-13K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
The Pile dataset

The Pile dataset is a large-scale dataset containing 800GB of text data.
- Dataset
- JSON
LM-Extraction benchmark

The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,...
- Dataset
- JSON
TREC05 spam corpus

The dataset used in the paper is the TREC05 spam corpus, which contains 39,999 real ham and 52,790 spam emails.
- Dataset
- JSON
Neural Speed Reading with Structural-Jump-LSTM

The dataset consists of 108 news headlines, 72 of which are true and 36 of which are false.
- Dataset
- JSON
gisette

The gisette dataset is a collection of 20,000 text documents, each containing a single sentence.
- Dataset
- JSON
epsilon

The epsilon dataset is a collection of 50,000 text documents, each containing a single sentence.
- Dataset
- JSON
Dual-sparse Regularized Randomized Reduction

The paper proposes dual-sparse regularized randomized reduction methods for classiﬁcation. The dataset used in the paper is the RCV1-binary dataset.
- Dataset
- JSON
Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection

A new benchmark dataset for fake news detection, containing 12,836 short statements labeled for truthfulness, subject, context/venue, speaker, state, party, and prior history.
- Dataset
- JSON
kdda

The dataset used in this paper for compressed sensing, Lasso regression, and Logistic Lasso regression problems.
- Dataset
- JSON
news20-binary

The dataset used in the paper is the news20-binary dataset.
- Dataset
- JSON
url

The dataset used in the paper is the url dataset.
- Dataset
- JSON
E2006-log1p

The dataset used in the paper is the E2006-log1p dataset.
- Dataset
- JSON

84 datasets found