211 datasets found

Groups: Text Classification

Filter Results
  • BookCorpus

    The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.
  • Reuters RCV1-v2

    The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly...
  • IMDB

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested the proposed method on three real data sets for the most relevant security...
  • Penn Treebank dataset

    The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset.
  • MNIST-SVHN-Text dataset

    The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels.
  • TREC

    The dataset used for sentiment analysis, question type classification, and subjectivity classification tasks.
  • LAION

    The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.
  • Training Language Models to Perform Tasks

    A dataset for training language models to perform tasks such as question answering and text classification.
  • GLUE

    Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...
  • E2E dataset

    The E2E dataset consists of 50K restaurant reviews together with the labels in terms of food type, price, and customer ratings.
  • Elsevier OA CC-BY corpus

    The Elsevier OA CC-BY corpus dataset consists of 40,000 open-access articles from across Elsevier's journals, representing a diverse research discipline.