Dataset - LDM

DREAM

The DREAM dataset is a dialogue-based multiple-choice QA dataset, introduced by Sun et al. (2019). It was collected from English-as-a-foreign-language examinations, designed by...
- Dataset
- JSON
Dialogue Summarization

Dialogue summarization datasets including SAMSum, DialogSum, TODSum, and DREAM.
- Dataset
- JSON
XTREME-UP

The XTREME-UP dataset is a cross-lingual question answering task that asks a model to predict the correct English answer span given a non-English question and an English answer...
- Dataset
- JSON
XOR-ATTRIQA

The XOR-ATTRIQA dataset is a classification task where model is asked to predict whether the provided answer to the question is supported by the given passage context, which...
- Dataset
- JSON
Stanford Human Preferences (SHP)

The Stanford Human Preferences (SHP) dataset is sourced from Reddit with various subreddits that focus on QA. Preferences have been extracted from the accumulated up- and...
- Dataset
- JSON
Pile

The Pile dataset consists of 800GB text from 22 domains. Cynical selection naturally prefers text data based on the target corpus.
- Dataset
- JSON
FigureQA

FigureQA is a dataset for visual question answering, containing line plots, bar charts, pie plots, and dot line plots.
- Dataset
- JSON
UNK-VQA

The UNK-VQA dataset is a dataset for evaluating the ability of large language models to answer questions when the answer is unknown.
- Dataset
- JSON
SPADES

This dataset contains question-answer pairs for training a semantic parser.
- Dataset
- JSON
GRAPHQUESTIONS

This dataset contains question-answer pairs for training a semantic parser.
- Dataset
- JSON
TutorQA

The dataset is used for concept graph recovery and question answering in NLP education.
- Dataset
- JSON
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

A dataset for question answering, containing Wikipedia graph for question answering.
- Dataset
- JSON
Natural Questions: A Benchmark for Question Answering Research

A benchmark for question answering research is introduced, which includes a large dataset of natural questions.
- Dataset
- JSON
WildQA

A video understanding dataset of videos recorded in outside settings, including video question answering and video evidence selection.
- Dataset
- JSON
WebSRC

WebSRC dataset for web-based structural reading comprehension.
- Dataset
- JSON
TIE: Topological Information Enhanced Structural Reading Comprehension on Web...

Topological Information Enhanced model (TIE) for web-based structural reading comprehension on web pages.
- Dataset
- JSON
EQG-RACE

Educational Question Generation (QG) dataset, used to train and evaluate QG models.
- Dataset
- JSON
Walking down the memory maze: beyond context limit through interactive reading

Walking down the memory maze: beyond context limit through interactive reading.
- Dataset
- JSON
WikiWeb2M

WikiWeb2M is a page-level multimodal Wikipedia dataset.
- Dataset
- JSON
Document structure in long document transformers

Document structure in long document transformers.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

257 datasets found