Text Classification - Groups

LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...

Dataset
JSON

Sample Selection for Data Augmentation in Natural Language Processing

Deep learning-based text classification models need abundant labeled data to obtain competitive performance. To tackle this, multiple researches try to use data augmentation to...

Dataset
JSON

FNID: Fake News Inference Dataset

A dataset for fake news inference

Dataset
JSON

Detecting Opinion Spams and Fake News Using Text Classification

A dataset for opinion spam and fake news detection

Dataset
JSON

Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection

A new benchmark dataset for fake news detection, containing 12,836 short statements labeled for truthfulness, subject, context/venue, speaker, state, party, and prior history.

Dataset
JSON

Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...

Dataset
JSON

Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...

Dataset
JSON

BERT

The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.

Dataset
JSON

SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...

Dataset
JSON

Text8

Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.

Dataset
JSON

Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.

Dataset
JSON

IMDB

The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested the proposed method on three real data sets for the most relevant security...

Dataset
JSON

Penn Treebank dataset

The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset.

Dataset
JSON

Training Language Models to Perform Tasks

A dataset for training language models to perform tasks such as question answering and text classification.

Dataset
JSON

GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...

Dataset
JSON

15 datasets found