Dataset - LDM

ConceptNet 5.53

ConceptNet 5.53 is a knowledge graph that connects words and phrases of natural language with labelled edges.
- Dataset
- JSON
Learning to speak and act in a fantasy text adventure game

A dataset of text-adventure game dialogues, including fantasy and horror games.
- Dataset
- JSON
Jericho

A dataset of 32 interactive fiction games, including dungeon crawl, Sci-Fi, mystery, comedy, and horror games.
- Dataset
- JSON
DocumentQA

The DocumentQA dataset is a benchmark for question answering research. It consists of questions answerable using TF-IDF for paragraph selection.
- Dataset
- JSON
BELEBELE Benchmark

A multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants.
- Dataset
- JSON
Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

Parameter-efficient fine-tuning for natural language understanding tasks
- Dataset
- JSON
SimpleWiki

The dataset for the task of identifying if a desire expressed by a subject in a given short piece of text was fulfilled.
- Dataset
- JSON
SQuAD: 100,000+ Questions for Machine Comprehension of Text

The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification.
- Dataset
- JSON
MASSIVE

The MASSIVE dataset is a comprehensive collection of approximately one million annotated utterances for various natural language understanding tasks such as slot-filling, intent...
- Dataset
- JSON
TreeMix: Compositional Constituency-based Data Augmentation for Natural Langu...

TreeMix is a compositional data augmentation approach for natural language understanding. It leverages constituency parsing tree to decompose sentences into sub-structures and...
- Dataset
- JSON
CoQA

The CoQA dataset is a benchmark for question answering research. It consists of conversational questions.
- Dataset
- JSON
Natural Instructions

The Natural Instructions (NI) dataset used for evaluating the performance of the DEPTH model on natural language understanding tasks.
- Dataset
- JSON
SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
- Dataset
- JSON
Natural Questions

The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
- Dataset
- JSON
Bing dataset

The Bing dataset is a large-scale dataset for natural language understanding and question answering.
- Dataset
- JSON
MS MARCO dataset

The MS MARCO dataset is a large-scale dataset for natural language understanding and question answering.
- Dataset
- JSON
SQuAD 2.0

The SQuAD 2.0 dataset is a new challenging task for natural language processing, which requires that machine can read, understand, and answer questions about a text. The dataset...
- Dataset
- JSON
GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

18 datasets found