Visual Question Answering - Groups

FigureQA

FigureQA is a dataset for visual question answering, containing line plots, bar charts, pie plots, and dot line plots.

Dataset
JSON

TallyQA

The TallyQA dataset is a large-scale open-ended visual counting dataset, which is well-suited to study statistical shortcuts.

Dataset
JSON

VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.

Dataset
JSON

MovieQA, TVQA, AVSD, EQA, Embodied QA

A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.

Dataset
JSON

GQA

The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.

Dataset
JSON

TGIF-QA

The TGIF-QA dataset consists of 165165 QA pairs chosen from 71741 animated GIFs. To evaluate the spatiotemporal reasoning ability at the video level, TGIF-QA dataset designs...

Dataset
JSON

Visual7W dataset

The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.

Dataset
JSON

CLEVR

CLEVR images contain objects characterized by a set of attributes (shape, color, size and material). The questions are grouped into 5 categories: Exist, Count, CompareInteger,...

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

9 datasets found