Visual Question Answering - Groups

RSVQAxBEN

The RSVQAxBEN dataset is a large-scale benchmark dataset for remote sensing visual question answering, based on the BigEarthNet (BEN) archive and containing 590,326 Sentinel-2...

Dataset
JSON

RSVQA-LR

The RSVQA-LR dataset is a large-scale benchmark dataset for remote sensing visual question answering, constructed using 7 Sentinel-2 tiles acquired over the Netherlands, from...

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

NLVR2

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.

Dataset
JSON

Measuring Machine Intelligence through Visual Question Answering

Measuring machine intelligence through visual question answering.

Dataset
JSON

VQA: Visual Question Answering

Visual Question Answering (VQA) has emerged as a prominent multi-discipline research problem in both academia and industry.

Dataset
JSON

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the...

Dataset
JSON

LXMERT

The LXMERT dataset is used for visual question answering task. It uses pre-trained weights provided by Tan and Bansal (2019) and fine-tunes it with adaptive approaches mentioned...

Dataset
JSON

VQA 2.0

The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...

Dataset
JSON

LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...

Dataset
JSON

SBU Captions

The SBU Captions dataset is a large-scale image-text dataset used for vision-language pre-training.

Dataset
JSON

Amazon Berkeley Objects Dataset (ABO)

The Amazon Berkeley Objects Dataset (ABO) is a public available e-commerce dataset with multiple images per product.

Dataset
JSON

CLEVR

CLEVR images contain objects characterized by a set of attributes (shape, color, size and material). The questions are grouped into 5 categories: Exist, Count, CompareInteger,...

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

COCO-QA

The COCO-QA dataset is used for visual question answering task. It consists of 123,287 images and 78,736 train and 38,948 test questions.

Dataset
JSON

Microsoft COCO

The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

58 datasets found