-
Clickbait Challenge 2017
The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection. -
Fake News Challenge Stage 1 (FNC-1)
The FNC-1 dataset is a supervised classification task for stance detection, where the goal is to automatically predict the labels in a supervised classification task. -
Semeval-2016 Task 6: Detecting stance in tweets
Semeval-2016 Task 6: Detecting stance in tweets. -
Rotten Tomatoes
The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences. -
HONEST Race
The dataset used for toxicity and stereotype mitigation task, which consists of 25 thousand examples of positive and negative movie reviews. -
IMDb Review Dataset
The IMDb review dataset is used for positive generation task. -
AmazonCat-13K
The dataset used in the LightDXML paper for extreme multi-label classification. -
The Pile dataset
The Pile dataset is a large-scale dataset containing 800GB of text data. -
LM-Extraction benchmark
The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,... -
TREC05 spam corpus
The dataset used in the paper is the TREC05 spam corpus, which contains 39,999 real ham and 52,790 spam emails. -
Neural Speed Reading with Structural-Jump-LSTM
The dataset consists of 108 news headlines, 72 of which are true and 36 of which are false. -
Dual-sparse Regularized Randomized Reduction
The paper proposes dual-sparse regularized randomized reduction methods for classification. The dataset used in the paper is the RCV1-binary dataset. -
Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection
A new benchmark dataset for fake news detection, containing 12,836 short statements labeled for truthfulness, subject, context/venue, speaker, state, party, and prior history. -
news20-binary
The dataset used in the paper is the news20-binary dataset. -
E2006-log1p
The dataset used in the paper is the E2006-log1p dataset.