24 datasets found

Tags: Text Classification

Filter Results
  • Augmenting Interpretable Models with LLMs during Training

    Aug-GAM and Aug-Tree are two instantiations of Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models.
  • KLUE

    KLUE benchmark dataset for Korean language understanding
  • Towards Improving Selective Prediction Ability of NLP Systems

    SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets
  • LLM dataset

    The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
  • AGNews

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
  • SST

    The dataset used in the paper is the Stanford Sentiment Treebank (SST) dataset, which contains standard train/dev/test sets and two subtasks: binary sentence classification or...
  • MNLI-m/mm

    The dataset used in the paper to evaluate attribution scores.
  • Penn Tree Bank

    The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
  • QQP

    The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.
  • Word2Vec

    Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification
  • Experimental Results

    The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active...
  • TEL-NLP

    The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
  • SlimPajama

    The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification.
  • BERT

    The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.
  • SQuAD

    The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
  • Natural Questions

    The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
  • TriviaQA

    The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.
  • SST-2

    The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and find that a model having higher AUC does not necessarily...
  • Stanford Alpaca

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...
  • AG News

    The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...