Natural Language Understanding - Groups

ATIS2 and ATIS3

The ATIS2 and ATIS3 datasets are used to create low-latency natural language understanding components.

Dataset
JSON

General Language Understanding Evaluation (GLUE) dataset

The General Language Understanding Evaluation (GLUE) dataset is a dataset used in the paper to evaluate the performance of natural language understanding models.

Dataset
JSON

FewCLUE dataset

The FewCLUE dataset is a Chinese few-shot learning evaluation benchmark.

Dataset
JSON

WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language U...

WALNUT is a benchmark for semi-weakly supervised learning for natural language understanding. It consists of 8 NLU tasks with different types, including document-level and...

Dataset
JSON

CoLA

The CoLA dataset has 8551 train and 527 development in domain samples.

Dataset
JSON

ROCStories (+GPT-J)

A corpus and cloze evaluation for deeper understanding of commonsense stories.

Dataset
JSON

ROCStories

The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.

Dataset
JSON

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

A corpus and cloze evaluation for deeper understanding of commonsense stories.

Dataset
JSON

GLUE benchmark

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank...

Dataset
JSON

SNLI

The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...

Dataset
JSON

BERT: Pre-training of deep bidirectional transformers for language understanding

This paper proposes BERT, a pre-trained deep bidirectional transformer for language understanding.

Dataset
JSON

GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...

Dataset
JSON

12 datasets found