-
ATIS2 and ATIS3
The ATIS2 and ATIS3 datasets are used to create low-latency natural language understanding components. -
General Language Understanding Evaluation (GLUE) dataset
The General Language Understanding Evaluation (GLUE) dataset is a dataset used in the paper to evaluate the performance of natural language understanding models. -
FewCLUE dataset
The FewCLUE dataset is a Chinese few-shot learning evaluation benchmark. -
WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language U...
WALNUT is a benchmark for semi-weakly supervised learning for natural language understanding. It consists of 8 NLU tasks with different types, including document-level and... -
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification. -
IPA dataset
The IPA dataset contains a set of Chinese utterances that were collected and annotated in the development process of a commercialized Intelligent Personal Assistant (IPA) named... -
OSQ dataset
The OSQ dataset covers 150 IND intents and also provides a set of manually labeled Out-of-Scope Queries (OSQ) that are not supported by the current system. -
TreeMix: Compositional Constituency-based Data Augmentation for Natural Langu...
TreeMix is a compositional data augmentation approach for natural language understanding. It leverages constituency parsing tree to decompose sentences into sub-structures and... -
ROCStories (+GPT-J)
A corpus and cloze evaluation for deeper understanding of commonsense stories. -
ROCStories
The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations. -
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
A corpus and cloze evaluation for deeper understanding of commonsense stories. -
GLUE benchmark
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank... -
Natural Instructions
The Natural Instructions (NI) dataset used for evaluating the performance of the DEPTH model on natural language understanding tasks. -
Bing dataset
The Bing dataset is a large-scale dataset for natural language understanding and question answering. -
MS MARCO dataset
The MS MARCO dataset is a large-scale dataset for natural language understanding and question answering. -
BERT: Pre-training of deep bidirectional transformers for language understanding
This paper proposes BERT, a pre-trained deep bidirectional transformer for language understanding.