Visual Question Answering - Groups

VizWiz-VQA

The VizWiz-VQA dataset is a large-scale visual question answering dataset that consists of 4,000 images with 10 crowd-worker answers each.

Dataset
JSON

TallyQA

The TallyQA dataset is a large-scale open-ended visual counting dataset, which is well-suited to study statistical shortcuts.

Dataset
JSON

High Quality Image Text Pairs

The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

OK-VQA

The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.

Dataset
JSON

Winoground

The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1).

Dataset
JSON

VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.

Dataset
JSON

GQA

The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

Amazon Berkeley Objects Dataset (ABO)

The Amazon Berkeley Objects Dataset (ABO) is a public available e-commerce dataset with multiple images per product.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

COCO-QA

The COCO-QA dataset is used for visual question answering task. It consists of 123,287 images and 78,736 train and 38,948 test questions.

Dataset
JSON

Microsoft COCO

The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

15 datasets found