-
VAULT: VAriable Unified Long Text representation for Machine Reading Comprehen...
VAULT: a light-weight and parallel-efficient paragraph representation for Machine Reading Comprehension (MRC) based on contextualized representation from long document input -
SPAGHETTI: Open-Domain Question Answering
SPAGHETTI: A hybrid open-domain question-answering system that combines semantic parsing and information retrieval to handle structured and unstructured data. -
Google-RE (Templates) dataset
The Google-RE (Templates) dataset contains 6.11K template-based prompts from Wikipedia and 3 relations. -
Comparing Template-based and Template-free Language Model Probing
Template-based probing uses expert-made templates to create prompts, while template-free probing uses naturally-occurring text. -
LLM dataset
The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their... -
Buffer of Thoughts
Buffer of Thoughts is a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). -
Winograd Schema - Knowledge Extraction Using Narrative Chains
The Winograd Schema Challenge (WSC) is a test of machine intelligence, designed to be an improvement on the Turing test. A Winograd Schema consists of a sentence and a... -
Graph-free Multi-hop Reading Comprehension: A Select-to-Guide Strategy
Multi-hop reading comprehension (MHRC) requires not only to predict the correct answer span in the given passage, but also to provide a chain of supporting evidences for... -
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence A...
The authors used a variety of datasets for question answering, including TriviaQA, Natural Questions, CountryQA, and Jeopardy questions. -
BIG-Bench Hard
The BIG-Bench Hard dataset is derived from the original BIG-Bench evaluation suite, focusing on tasks that pose challenges to existing language models. -
Leveraging QA Datasets to Improve Generative Data Augmentation
The paper proposes a method to leverage QA datasets for training generative language models to be context generators for a given question and answer. -
Diverse and Specific Clarification Question Generation with Keywords
Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQ-Gen) can be a promising approach to help alleviate... -
A large annotated corpus for learning natural language inference
A large annotated corpus for learning natural language inference -
Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers
The dataset consists of approximately 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in... -
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification. -
FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language
FarFetched is a modular framework that enables people to verify any kind of textual claim based on the incorporated evidence from textual news sources. -
Room-to-Room (R2R) dataset
The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three... -
Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope
Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. -
TruthfulQA
The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods. -
QQP Dataset
The QQP dataset contains more than 400k question pairs.