Visual Question Answering - Groups

SpatialSense

A dataset for visual spatial relationship classification (VSRC) with nine well-defined spatial relations.

Dataset
JSON

Winoground

The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1).

Dataset
JSON

VQA 1.0

The VQA 1.0 dataset is a large-scale dataset for visual question answering, containing 15,000 images with 50,000 questions.

Dataset
JSON

VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.

Dataset
JSON

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

MovieQA, TVQA, AVSD, EQA, Embodied QA

A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.

Dataset
JSON

Visual Spatial Reasoning

Visual Spatial Reasoning (VSR) is a controlled probing dataset for testing vision-language models' capabilities of recognizing and reasoning about spatial relations in natural...

Dataset
JSON

VQA v2.0

We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.

Dataset
JSON

GQA

The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.

Dataset
JSON

TGIF-QA

The TGIF-QA dataset consists of 165165 QA pairs chosen from 71741 animated GIFs. To evaluate the spatiotemporal reasoning ability at the video level, TGIF-QA dataset designs...

Dataset
JSON

VQA-CP v2

This paper proposes VQA-CP v2, a standard OOD benchmark in VQA.

Dataset
JSON

Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Quest...

This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

Sort-of-CLEVR

The dataset used in the paper is Sort-of-CLEVR, a visual question answering dataset.

Dataset
JSON

VQA-CP v2 and VQA 2.0

The dataset used in the paper is VQA-CP v2 and VQA 2.0, which are two standard datasets for visual question answering.

Dataset
JSON

Meta-VQA

The Meta-VQA dataset is a modification of the VQA v2.0 dataset for Visual-Question-Answering, composed of 1234 unique tasks (questions), split into 870 training tasks and 373...

Dataset
JSON

CLEVR dataset

The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.

Dataset
JSON

Visual7W dataset

The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.

Dataset
JSON

VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

Extended RSVQAxBEN

The extended RSVQAxBEN dataset is an extension of the RSVQAxBEN dataset, including all the spectral bands of Sentinel-2 images with 10m and 20m spatial resolution.

Dataset
JSON

58 datasets found