-
Dialogue Summarization
Dialogue summarization datasets including SAMSum, DialogSum, TODSum, and DREAM. -
Visual Question Answering (VQA)
The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images. -
XOR-ATTRIQA
The XOR-ATTRIQA dataset is a classification task where model is asked to predict whether the provided answer to the question is supported by the given passage context, which... -
Stanford Human Preferences (SHP)
The Stanford Human Preferences (SHP) dataset is sourced from Reddit with various subreddits that focus on QA. Preferences have been extracted from the accumulated up- and... -
GRAPHQUESTIONS
This dataset contains question-answer pairs for training a semantic parser. -
Learning an Executable Neural Semantic Parser
This paper describes a neural semantic parser that maps natural language utterances onto logical forms which can be executed against a task-specific environment, such as a... -
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
HotpotQA is a dataset for diverse, explainable multi-hop question answering. -
Multilingual CommonsenseQA
Multilingual CommonsenseQA (mCSQA) is a dataset for evaluating the common sense reasoning capabilities of multilingual LMs. -
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
A dataset for question answering, containing Wikipedia graph for question answering. -
Natural Questions: A Benchmark for Question Answering Research
A benchmark for question answering research is introduced, which includes a large dataset of natural questions. -
Revisiting the Open-Domain Question Answering Pipeline
Open-domain question answering (QA) system that consists of a new multi-stage pipeline that employs a traditional BM25-based information retriever, RM3-based neural relevance...