21 datasets found

Tags: Text Classification

Filter Results
  • LLM dataset

    The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
  • AGNews

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
  • SST

    The dataset used in the paper is the Stanford Sentiment Treebank (SST) dataset, which contains standard train/dev/test sets and two subtasks: binary sentence classification or...
  • MNLI-m/mm

    The dataset used in the paper to evaluate attribution scores.
  • Penn Tree Bank

    The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
  • QQP

    The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.
  • Word2Vec

    Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification
  • Experimental Results

    The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active...
  • TEL-NLP

    The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
  • SlimPajama

    The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification.
  • BERT

    The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.
  • SQuAD

    The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
  • Natural Questions

    The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
  • TriviaQA

    The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.
  • SST-2

    The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and find that a model having higher AUC does not necessarily...
  • Stanford Alpaca

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...
  • AG News

    The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • SNLI

    The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...