Natural Language Understanding - Groups

SQuAD: 100,000+ Questions for Machine Comprehension of Text

The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification.

Dataset
JSON

ROCStories

The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.

Dataset
JSON

GLUE benchmark

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank...

Dataset
JSON

SNLI

The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...

Dataset
JSON

BERT: Pre-training of deep bidirectional transformers for language understanding

This paper proposes BERT, a pre-trained deep bidirectional transformer for language understanding.

Dataset
JSON

GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...

Dataset
JSON