Text Classification - Groups

BookCorpus

The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.

Dataset
JSON

Reuters RCV1-v2

The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly...

Dataset
JSON

IMDB

The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested the proposed method on three real data sets for the most relevant security...

Dataset
JSON

Penn Treebank dataset

The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset.

Dataset
JSON

MNIST-SVHN-Text dataset

The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels.

Dataset
JSON

TREC

The dataset used for sentiment analysis, question type classification, and subjectivity classification tasks.

Dataset
JSON

LAION

The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.

Dataset
JSON

Training Language Models to Perform Tasks

A dataset for training language models to perform tasks such as question answering and text classification.

Dataset
JSON

GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...

Dataset
JSON

E2E dataset

The E2E dataset consists of 50K restaurant reviews together with the labels in terms of food type, price, and customer ratings.

Dataset
JSON

Elsevier OA CC-BY corpus

The Elsevier OA CC-BY corpus dataset consists of 40,000 open-access articles from across Elsevier's journals, representing a diverse research discipline.

Dataset
JSON

211 datasets found