-
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence A...
The authors used a variety of datasets for question answering, including TriviaQA, Natural Questions, CountryQA, and Jeopardy questions. -
A large annotated corpus for learning natural language inference
A large annotated corpus for learning natural language inference -
Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers
The dataset consists of approximately 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in... -
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification. -
Room-to-Room (R2R) dataset
The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three... -
TruthfulQA
The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods. -
A general language assistant as a laboratory for alignment
A general language assistant for aligning language models with human users -
Natural Questions
The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer. -
Visual Genome
The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships. -
WikiTableQuestions
Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...