-
LLM dataset
The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their... -
CLEVR-Humans
The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR. -
MMLU dataset
The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),... -
LLaVA 158k
The LLaVA 158k dataset is a large-scale multimodal learning dataset, which is used for training and testing multimodal large language models. -
Multimodal Robustness Benchmark
The MMR benchmark is designed to evaluate MLLMs' comprehension of visual content and robustness against misleading questions, ensuring models truly leverage multimodal inputs... -
Survey Questions
The dataset used in this research is a set of 50 questions from diverse fields such as general knowledge, food, and travel. -
Knowledge Graph-Enhanced Large Language Models via Path Selection
Two datasets, MetaQA and FACTKG, are used to evaluate the effectiveness of the proposed method KELP. MetaQA is a critical benchmark dataset containing subsets of questions with... -
SQuAD 1.1 and SQuAD 2
The SQuAD 1.1 and SQuAD 2 datasets are used to evaluate the performance of the EQuANt model. -
Question Answering
The task is to predict whether the number of edges assigned x is greater than the number of edges assigned y. -
Leveraging QA Datasets to Improve Generative Data Augmentation
The paper proposes a method to leverage QA datasets for training generative language models to be context generators for a given question and answer. -
A large annotated corpus for learning natural language inference
A large annotated corpus for learning natural language inference -
QNLI Textual Entailment dataset
The dataset used in this paper is a noisy annotated dataset obtained from a zero-shot learner based module. -
GraphQueries
The task of Question Answering over Linked Data (QALD) has received increased attention over the last years (see the surveys [14] and [36]). The task consists in mapping natural... -
AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data
The task of answering natural language questions over RDF data has received wide interest in recent years, in particular in the context of the series of QALD benchmarks. The... -
General Language Understanding Evaluation (GLUE) dataset
The General Language Understanding Evaluation (GLUE) dataset is a dataset used in the paper to evaluate the performance of natural language understanding models. -
ForecastTKGQuestions
A benchmark for temporal question answering and forecasting over temporal knowledge graphs.