Dataset - LDM

OOS

The Out Of Scope dataset (Larson et al., 2019) is an intent detection dataset containing 150 equally-distributed classes.
- Dataset
- JSON
ATIS2 and ATIS3

The ATIS2 and ATIS3 datasets are used to create low-latency natural language understanding components.
- Dataset
- JSON
General Language Understanding Evaluation (GLUE) dataset

The General Language Understanding Evaluation (GLUE) dataset is a dataset used in the paper to evaluate the performance of natural language understanding models.
- Dataset
- JSON
FewCLUE dataset

The FewCLUE dataset is a Chinese few-shot learning evaluation benchmark.
- Dataset
- JSON
WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language U...

WALNUT is a benchmark for semi-weakly supervised learning for natural language understanding. It consists of 8 NLU tasks with different types, including document-level and...
- Dataset
- JSON
CoLA

The CoLA dataset has 8551 train and 527 development in domain samples.
- Dataset
- JSON
ROCStories (+GPT-J)

A corpus and cloze evaluation for deeper understanding of commonsense stories.
- Dataset
- JSON
ROCStories

The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.
- Dataset
- JSON
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

A corpus and cloze evaluation for deeper understanding of commonsense stories.
- Dataset
- JSON
GLUE benchmark

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank...
- Dataset
- JSON
StackOverﬂow

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
BANKING

The BANKING dataset is an intent classiﬁcation dataset in the banking domain.
- Dataset
- JSON
SNLI

The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...
- Dataset
- JSON
BERT: Pre-training of deep bidirectional transformers for language understanding

This paper proposes BERT, a pre-trained deep bidirectional transformer for language understanding.
- Dataset
- JSON
GLUE development set

The GLUE development set is a dataset used for evaluating the performance of language models.
- Dataset
- JSON
GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

16 datasets found