Question Answering - Groups

CLEVR-Humans

The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR.

Dataset
JSON

LLaVA 158k

The LLaVA 158k dataset is a large-scale multimodal learning dataset, which is used for training and testing multimodal large language models.

Dataset
JSON

Multimodal Robustness Benchmark

The MMR benchmark is designed to evaluate MLLMs' comprehension of visual content and robustness against misleading questions, ensuring models truly leverage multimodal inputs...

Dataset
JSON

Modality-Aware Integration with Large Language Models for Knowledge-based Vis...

Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs).

Dataset
JSON

VQA-CP

The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.

Dataset
JSON

GQA-OOD: Out-of-Domain VQA Benchmark

GQA-OOD is a benchmark dedicated to the out-of-domain VQA evaluation.

Dataset
JSON

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question...

GQA is a new dataset for real-world visual reasoning and compositional question answering.

Dataset
JSON

OK-VQA

The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.

Dataset
JSON

FVQA

FVQA is a fact-based visual question answering dataset, containing 2190 images and 5826 (question, answer) pairs, with supporting facts selected from knowledge bases.

Dataset
JSON

Visual7W

The Visual7W dataset for Visual Question Answering (VQA). The dataset contains 7,000 images with 7,000 queries.

Dataset
JSON

VisualBERT

The VisualBERT dataset is a pre-trained model for vision-and-language tasks, which is built on top of PyTorch.

Dataset
JSON

Task Driven Image Understanding Challenge (TDIUC)

The Task Driven Image Understanding Challenge (TDIUC) dataset is a large VQA dataset with 12 more fine-grained categories proposed to compensate for the bias in distribution of...

Dataset
JSON

VQA 1.0

The VQA 1.0 dataset is a large-scale dataset for visual question answering, containing 15,000 images with 50,000 questions.

Dataset
JSON

VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.

Dataset
JSON

VQA v2.0

We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.

Dataset
JSON

GQA

The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.

Dataset
JSON

TGIF-QA

The TGIF-QA dataset consists of 165165 QA pairs chosen from 71741 animated GIFs. To evaluate the spatiotemporal reasoning ability at the video level, TGIF-QA dataset designs...

Dataset
JSON

Visual Text Question Answering (VTQA)

A new challenge named Visual Text Question Answering (VTQA) along with a corresponding dataset, which includes 23,781 questions based on 10,124 image-text pairs.

Dataset
JSON

VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

Measuring Machine Intelligence through Visual Question Answering

Measuring machine intelligence through visual question answering.

Dataset
JSON

27 datasets found