Text Classification - Groups

Proprietary Large-Scale Industry Dataset

The dataset used for the proposed Joint Multi-Domain Learning for Automatic Short Answer Grading.
- Dataset
- JSON
IMDb Review Dataset

The IMDb review dataset is used for positive generation task.
- Dataset
- JSON
AmazonCat-13K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
The Pile dataset

The Pile dataset is a large-scale dataset containing 800GB of text data.
- Dataset
- JSON
LM-Extraction benchmark

The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,...
- Dataset
- JSON
TREC05 spam corpus

The dataset used in the paper is the TREC05 spam corpus, which contains 39,999 real ham and 52,790 spam emails.
- Dataset
- JSON
Neural Speed Reading with Structural-Jump-LSTM

The dataset consists of 108 news headlines, 72 of which are true and 36 of which are false.
- Dataset
- JSON
Sample Selection for Data Augmentation in Natural Language Processing

Deep learning-based text classification models need abundant labeled data to obtain competitive performance. To tackle this, multiple researches try to use data augmentation to...
- Dataset
- JSON
gisette

The gisette dataset is a collection of 20,000 text documents, each containing a single sentence.
- Dataset
- JSON
epsilon

The epsilon dataset is a collection of 50,000 text documents, each containing a single sentence.
- Dataset
- JSON
Dual-sparse Regularized Randomized Reduction

The paper proposes dual-sparse regularized randomized reduction methods for classiﬁcation. The dataset used in the paper is the RCV1-binary dataset.
- Dataset
- JSON
FNID: Fake News Inference Dataset

A dataset for fake news inference
- Dataset
- JSON
Detecting Opinion Spams and Fake News Using Text Classification

A dataset for opinion spam and fake news detection
- Dataset
- JSON
Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection

A new benchmark dataset for fake news detection, containing 12,836 short statements labeled for truthfulness, subject, context/venue, speaker, state, party, and prior history.
- Dataset
- JSON
kdda

The dataset used in this paper for compressed sensing, Lasso regression, and Logistic Lasso regression problems.
- Dataset
- JSON
news20-binary

The dataset used in the paper is the news20-binary dataset.
- Dataset
- JSON
url

The dataset used in the paper is the url dataset.
- Dataset
- JSON
E2006-log1p

The dataset used in the paper is the E2006-log1p dataset.
- Dataset
- JSON
webspam

The dataset used in the paper is the webspam dataset.
- Dataset
- JSON
REUTERS

The dataset is used to evaluate the performance of the Linear Additive Markov Process (LAMP) on real-world sequences.
- Dataset
- JSON

182 datasets found