Visual Question Answering - Groups

SMART-101 dataset

The dataset for the SMART-101 challenge consists of 101 unique puzzles that require a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning,...

Dataset
JSON

High Quality Image Text Pairs

The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

OK-VQA

The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.

Dataset
JSON

Visual7W

The Visual7W dataset for Visual Question Answering (VQA). The dataset contains 7,000 images with 7,000 queries.

Dataset
JSON

Mutan: Multimodal Tucker Fusion for Visual Question Answering

The dataset used in the paper is a collection of images and corresponding referring expressions.

Dataset
JSON

VQAvs

VQAvs is a dataset for visual question answering, containing questions answerable using visual images.

Dataset
JSON

VQA-CPv1 and VQA-CPv2

VQA-CPv1 and VQA-CPv2 are datasets for visual question answering, containing questions answerable using visual images.

Dataset
JSON

Object Attribute Matters in Visual Question Answering

Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. The proposed approach utilizes object attributes to...

Dataset
JSON

SpatialSense

A dataset for visual spatial relationship classification (VSRC) with nine well-defined spatial relations.

Dataset
JSON

Winoground

The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1).

Dataset
JSON

VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.

Dataset
JSON

MovieQA, TVQA, AVSD, EQA, Embodied QA

A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.

Dataset
JSON

VQA v2.0

We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.

Dataset
JSON

GQA

The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

Sort-of-CLEVR

The dataset used in the paper is Sort-of-CLEVR, a visual question answering dataset.

Dataset
JSON

CLEVR dataset

The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.

Dataset
JSON

Visual7W dataset

The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.

Dataset
JSON

VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

27 datasets found