Visual Reasoning - Groups

VQA-CP

The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.

Dataset
JSON

NLVR2

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

3 datasets found

VQA-CP

NLVR2

Visual Genome