Natural Language Processing - Groups

Augmenting Interpretable Models with LLMs during Training

Aug-GAM and Aug-Tree are two instantiations of Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely eﬃcient and interpretable models.

Dataset
JSON

KLUE

KLUE benchmark dataset for Korean language understanding

Dataset
JSON

Towards Improving Selective Prediction Ability of NLP Systems

SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets

Dataset
JSON

LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...

Dataset
JSON

AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.

Dataset
JSON

SST

The dataset used in the paper is the Stanford Sentiment Treebank (SST) dataset, which contains standard train/dev/test sets and two subtasks: binary sentence classification or...

Dataset
JSON

MNLI-m/mm

The dataset used in the paper to evaluate attribution scores.

Dataset
JSON

Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...

Dataset
JSON

QQP

The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.

Dataset
JSON

Word2Vec

Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification

Dataset
JSON

Experimental Results

The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active...

Dataset
JSON

TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.

Dataset
JSON

SlimPajama

The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification.

Dataset
JSON

BERT

The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.

Dataset
JSON

SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...

Dataset
JSON

Natural Questions

The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.

Dataset
JSON

TriviaQA

The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.

Dataset
JSON

SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...

Dataset
JSON

Stanford Alpaca

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...

Dataset
JSON

AG News

The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...

Dataset
JSON

24 datasets found