Visual Question Answering - Groups

FigureQA

FigureQA is a dataset for visual question answering, containing line plots, bar charts, pie plots, and dot line plots.
- Dataset
- JSON
DVQA

DVQA is a dataset for visual question answering, containing bar charts and natural language patterns.
- Dataset
- JSON
VizWiz-VQA

The VizWiz-VQA dataset is a large-scale visual question answering dataset that consists of 4,000 images with 10 crowd-worker answers each.
- Dataset
- JSON
VQAv2 dataset

The VQAv2 dataset, containing open-ended questions on 265k images, with 5.4 questions per image on average.
- Dataset
- JSON
ST-VQA

ST-VQA dataset consists of 23,038 images with 31,791 question-answer pairs.
- Dataset
- JSON
TextVQA

TextVQA dataset consists of 28,418 images with 45,336 questions.
- Dataset
- JSON
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

CARETS is a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests.
- Dataset
- JSON
TallyQA

The TallyQA dataset is a large-scale open-ended visual counting dataset, which is well-suited to study statistical shortcuts.
- Dataset
- JSON
GRiD-3D

A diagnostic VQA dataset based on abstract objects that enables a faster and less biased evaluation of spatial reasoning behavior in VQA compared with the original GRiD-3D dataset.
- Dataset
- JSON
GRiD-A-3D

A comprehensive, simpliﬁed diagnostic VQA dataset with abstract objects that shows similar behavior to the original GRiD-3D dataset when learned by the two established VQA...
- Dataset
- JSON
CLEVR-Humans

The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR.
- Dataset
- JSON
Image Captioning and Visual Question Answering

The dataset is used for image captioning and visual question answering.
- Dataset
- JSON
LLaVA-Instruct-150k

Visual question answering dataset
- Dataset
- JSON
VQA-CPv2

The VQA-CPv2 dataset is used to study the robustness of VQA methods against linguistic biases.
- Dataset
- JSON
VQA-CP

The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.
- Dataset
- JSON
SMART-101 dataset

The dataset for the SMART-101 challenge consists of 101 unique puzzles that require a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning,...
- Dataset
- JSON
CLEVR-CoGenT

The CLEVR-CoGenT dataset is a dataset for visual question answering, where the questions consist on comparing the position of two objects.
- Dataset
- JSON
GQA-OOD: Out-of-Domain VQA Benchmark

GQA-OOD is a benchmark dedicated to the out-of-domain VQA evaluation.
- Dataset
- JSON
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question...

GQA is a new dataset for real-world visual reasoning and compositional question answering.
- Dataset
- JSON
High Quality Image Text Pairs

The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles.
- Dataset
- JSON

68 datasets found