Visual Question Answering - Groups

VQA-CP v2

This paper proposes VQA-CP v2, a standard OOD benchmark in VQA.

Dataset
JSON

Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Quest...

This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

Sort-of-CLEVR

The dataset used in the paper is Sort-of-CLEVR, a visual question answering dataset.

Dataset
JSON

VQA-CP v2 and VQA 2.0

The dataset used in the paper is VQA-CP v2 and VQA 2.0, which are two standard datasets for visual question answering.

Dataset
JSON

Meta-VQA

The Meta-VQA dataset is a modification of the VQA v2.0 dataset for Visual-Question-Answering, composed of 1234 unique tasks (questions), split into 870 training tasks and 373...

Dataset
JSON

CLEVR dataset

The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.

Dataset
JSON

Visual7W dataset

The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.

Dataset
JSON

VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

Extended RSVQAxBEN

The extended RSVQAxBEN dataset is an extension of the RSVQAxBEN dataset, including all the spectral bands of Sentinel-2 images with 10m and 20m spatial resolution.

Dataset
JSON

RSVQAxBEN

The RSVQAxBEN dataset is a large-scale benchmark dataset for remote sensing visual question answering, based on the BigEarthNet (BEN) archive and containing 590,326 Sentinel-2...

Dataset
JSON

RSVQA-LR

The RSVQA-LR dataset is a large-scale benchmark dataset for remote sensing visual question answering, constructed using 7 Sentinel-2 tiles acquired over the Netherlands, from...

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

NLVR2

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.

Dataset
JSON

Measuring Machine Intelligence through Visual Question Answering

Measuring machine intelligence through visual question answering.

Dataset
JSON

VQA: Visual Question Answering

Visual Question Answering (VQA) has emerged as a prominent multi-discipline research problem in both academia and industry.

Dataset
JSON

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the...

Dataset
JSON

LXMERT

The LXMERT dataset is used for visual question answering task. It uses pre-trained weights provided by Tan and Bansal (2019) and fine-tunes it with adaptive approaches mentioned...

Dataset
JSON

VQA 2.0

The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...

Dataset
JSON

LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...

Dataset
JSON

68 datasets found