-
Text Classification as Matching
Many-class text classification is formulated as a matching problem between the input texts and the class descriptions. -
Newsgroups 4
The dataset used in this paper for Dominant Set Clustering. -
Newsgroups 3
The dataset used in this paper for Dominant Set Clustering. -
Newsgroups 2
The dataset used in this paper for Dominant Set Clustering. -
SST-1, SST-2, SUBJ, IMDB
The dataset used for text classification tasks, including SST-1, SST-2, SUBJ, and IMDB. -
Text Classification
Text classification dataset -
Text, Tabular and Image Classification
Text, tabular and image classification datasets -
Sent140 dataset
The dataset used in the paper is a real-world dataset for sentiment analysis. -
Online news popularity data
The dataset contains features about articles published by Mashable web site over a period of two years. -
MPQA Dataset
The MPQA dataset contains 10,606 opinions, and each of them is labeled as Objective or Subjective. -
CR Dataset
The MR dataset is a movie review repository (containing 10,662 reviews) while CR contains 3,775 reviews about products, e.g. a music player. -
Movie Review Repository (MR)
The word-level model consists of one convolutional layer, followed by a max pooling layer and a fully connected layer with dropout, and last a softmax output layer. -
DBpedia Ontology Dataset
Two representative DNN models and some corresponding datasets are chosen as the experiment targets to evaluate the effectiveness of the proposed method. -
Banknote Authentication
data extracted from real images of forged banknotes, with the help of an industrial camera. -
RTE dataset
RTE dataset -
Hatespeech
The Hatespeech dataset is a collection of tweets containing lexicons used in hate speech. -
Amazon Books
The Amazon Books dataset is a collection of user ratings for books, with each rating indicating the user's preference for the book. -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...