-
Proprietary Large-Scale Industry Dataset
The dataset used for the proposed Joint Multi-Domain Learning for Automatic Short Answer Grading. -
IMDb Review Dataset
The IMDb review dataset is used for positive generation task. -
AmazonCat-13K
The dataset used in the LightDXML paper for extreme multi-label classification. -
The Pile dataset
The Pile dataset is a large-scale dataset containing 800GB of text data. -
LM-Extraction benchmark
The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,... -
TREC05 spam corpus
The dataset used in the paper is the TREC05 spam corpus, which contains 39,999 real ham and 52,790 spam emails. -
Neural Speed Reading with Structural-Jump-LSTM
The dataset consists of 108 news headlines, 72 of which are true and 36 of which are false. -
Sample Selection for Data Augmentation in Natural Language Processing
Deep learning-based text classification models need abundant labeled data to obtain competitive performance. To tackle this, multiple researches try to use data augmentation to... -
Dual-sparse Regularized Randomized Reduction
The paper proposes dual-sparse regularized randomized reduction methods for classification. The dataset used in the paper is the RCV1-binary dataset. -
FNID: Fake News Inference Dataset
A dataset for fake news inference -
Detecting Opinion Spams and Fake News Using Text Classification
A dataset for opinion spam and fake news detection -
Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection
A new benchmark dataset for fake news detection, containing 12,836 short statements labeled for truthfulness, subject, context/venue, speaker, state, party, and prior history. -
news20-binary
The dataset used in the paper is the news20-binary dataset. -
E2006-log1p
The dataset used in the paper is the E2006-log1p dataset.