Visual Question Answering - Groups

VQA-CP

The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.

Dataset
JSON

CLEVR dataset

The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.

Dataset
JSON

NLVR2

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

4 datasets found

VQA-CP

CLEVR dataset

NLVR2

Visual Genome