Text Classification - Groups

Text Classification as Matching

Many-class text classification is formulated as a matching problem between the input texts and the class descriptions.
- Dataset
- JSON
Newsgroups 4

The dataset used in this paper for Dominant Set Clustering.
- Dataset
- JSON
Newsgroups 3

The dataset used in this paper for Dominant Set Clustering.
- Dataset
- JSON
Newsgroups 2

The dataset used in this paper for Dominant Set Clustering.
- Dataset
- JSON
SST-1, SST-2, SUBJ, IMDB

The dataset used for text classification tasks, including SST-1, SST-2, SUBJ, and IMDB.
- Dataset
- JSON
Text Classification

Text classification dataset
- Dataset
- JSON
Text, Tabular and Image Classification

Text, tabular and image classification datasets
- Dataset
- JSON
Sent140 dataset

The dataset used in the paper is a real-world dataset for sentiment analysis.
- Dataset
- JSON
Online news popularity data

The dataset contains features about articles published by Mashable web site over a period of two years.
- Dataset
- JSON
MPQA Dataset

The MPQA dataset contains 10,606 opinions, and each of them is labeled as Objective or Subjective.
- Dataset
- JSON
CR Dataset

The MR dataset is a movie review repository (containing 10,662 reviews) while CR contains 3,775 reviews about products, e.g. a music player.
- Dataset
- JSON
Movie Review Repository (MR)

The word-level model consists of one convolutional layer, followed by a max pooling layer and a fully connected layer with dropout, and last a softmax output layer.
- Dataset
- JSON
DBpedia Ontology Dataset

Two representative DNN models and some corresponding datasets are chosen as the experiment targets to evaluate the effectiveness of the proposed method.
- Dataset
- JSON
Banknote Authentication

data extracted from real images of forged banknotes, with the help of an industrial camera.
- Dataset
- JSON
RTE dataset

RTE dataset
- Dataset
- JSON
FastText

The FastText dataset is a subword token embedding model. It produces a vector representation of a word based on composing embeddings of the character n-grams composing the word.
- Dataset
- JSON
Hatespeech

The Hatespeech dataset is a collection of tweets containing lexicons used in hate speech.
- Dataset
- JSON
Amazon Books

The Amazon Books dataset is a collection of user ratings for books, with each rating indicating the user's preference for the book.
- Dataset
- JSON
C4 dataset

The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset.
- Dataset
- JSON
Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
- Dataset
- JSON

211 datasets found